Dataset statistics
| Number of variables | 24 |
|---|---|
| Number of observations | 500710 |
| Missing cells | 5167226 |
| Missing cells (%) | 43.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 482.8 MiB |
| Average record size in memory | 1011.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 12 |
| URL | 3 |
| Unsupported | 1 |
timestamp has a high cardinality: 492898 distinct values | High cardinality |
text_lang_ft has a high cardinality: 2648 distinct values | High cardinality |
text_normalized has a high cardinality: 355734 distinct values | High cardinality |
links has a high cardinality: 95588 distinct values | High cardinality |
hashtag has a high cardinality: 93371 distinct values | High cardinality |
hashtag_lang has a high cardinality: 2344 distinct values | High cardinality |
hashtag_en has a high cardinality: 92987 distinct values | High cardinality |
cashtag has a high cardinality: 415 distinct values | High cardinality |
media has a high cardinality: 104798 distinct values | High cardinality |
mentioned_users has a high cardinality: 62698 distinct values | High cardinality |
tweet_source has a high cardinality: 3579 distinct values | High cardinality |
credibility is highly imbalanced (76.9%) | Imbalance |
tweet_source is highly imbalanced (62.4%) | Imbalance |
links has 349025 (69.7%) missing values | Missing |
hashtag has 311862 (62.3%) missing values | Missing |
hashtag_lang has 311885 (62.3%) missing values | Missing |
hashtag_en has 311885 (62.3%) missing values | Missing |
cashtag has 499478 (99.8%) missing values | Missing |
media has 390417 (78.0%) missing values | Missing |
image_url has 396933 (79.3%) missing values | Missing |
video_url has 495073 (98.9%) missing values | Missing |
GIF_url has 499852 (99.8%) missing values | Missing |
reply_to_user has 433060 (86.5%) missing values | Missing |
mentioned_users has 347742 (69.4%) missing values | Missing |
quoted_tweet has 470842 (94.0%) missing values | Missing |
credibility has 349025 (69.7%) missing values | Missing |
likes is highly skewed (γ1 = 143.3287984) | Skewed |
retweets is highly skewed (γ1 = 167.9977401) | Skewed |
replies is highly skewed (γ1 = 403.2788348) | Skewed |
quoted_by_count is highly skewed (γ1 = 269.4598466) | Skewed |
timestamp is uniformly distributed | Uniform |
media is uniformly distributed | Uniform |
tweet_id has unique values | Unique |
reply_to_user is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
sentiment_polarity has 210275 (42.0%) zeros | Zeros |
likes has 314331 (62.8%) zeros | Zeros |
retweets has 359328 (71.8%) zeros | Zeros |
replies has 428227 (85.5%) zeros | Zeros |
quoted_by_count has 460723 (92.0%) zeros | Zeros |
Reproduction
| Analysis started | 2023-04-06 20:42:03.028315 |
|---|---|
| Analysis finished | 2023-04-06 20:43:32.792085 |
| Duration | 1 minute and 29.76 seconds |
| Software version | pandas-profiling v3.6.6 |
| Download configuration | config.json |
user_id
Real number (ℝ)
| Distinct | 97988 |
|---|---|
| Distinct (%) | 19.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.7750193 × 1017 |
| Minimum | 521 |
|---|---|
| Maximum | 1.46 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 521 |
|---|---|
| 5-th percentile | 20751449 |
| Q1 | 1.919453 × 108 |
| median | 1.4098525 × 109 |
| Q3 | 7.68 × 1017 |
| 95-th percentile | 1.2 × 1018 |
| Maximum | 1.46 × 1018 |
| Range | 1.46 × 1018 |
| Interquartile range (IQR) | 7.68 × 1017 |
Descriptive statistics
| Standard deviation | 4.5318511 × 1017 |
|---|---|
| Coefficient of variation (CV) | 1.6330881 |
| Kurtosis | -0.47032404 |
| Mean | 2.7750193 × 1017 |
| Median Absolute Deviation (MAD) | 1.3651042 × 109 |
| Skewness | 1.1320796 |
| Sum | 7.11702 × 1018 |
| Variance | 2.0537674 × 1035 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.76 × 1017 | 3564 | 0.7% |
| 1945935708 | 2908 | 0.6% |
| 1.12 × 1018 | 2738 | 0.5% |
| 9.95 × 1017 | 2715 | 0.5% |
| 1.26 × 1018 | 2503 | 0.5% |
| 1.22 × 1018 | 2348 | 0.5% |
| 8.91 × 1017 | 2312 | 0.5% |
| 1.09 × 1018 | 2239 | 0.4% |
| 1.1 × 1018 | 2230 | 0.4% |
| 8.84 × 1017 | 2187 | 0.4% |
| Other values (97978) | 474966 |
| Value | Count | Frequency (%) |
| 521 | 1 | < 0.1% |
| 1378 | 1 | < 0.1% |
| 2397 | 5 | |
| 2806 | 1 | < 0.1% |
| 3249 | 2 | < 0.1% |
| 3334 | 1 | < 0.1% |
| 3336 | 1 | < 0.1% |
| 5618 | 2 | < 0.1% |
| 5658 | 1 | < 0.1% |
| 6664 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1.46 × 1018 | 56 | < 0.1% |
| 1.45 × 1018 | 117 | < 0.1% |
| 1.44 × 1018 | 158 | < 0.1% |
| 1.43 × 1018 | 277 | |
| 1.42 × 1018 | 250 | |
| 1.41 × 1018 | 299 | |
| 1.4 × 1018 | 438 | |
| 1.39 × 1018 | 440 | |
| 1.38 × 1018 | 508 | |
| 1.37 × 1018 | 610 |
timestamp
Categorical
HIGH CARDINALITY  UNIFORM 
| Distinct | 492898 |
|---|---|
| Distinct (%) | 98.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 39.2 MiB |
| 2016-09-05 19:25:39+00:00 | 33 |
|---|---|
| 2017-01-14 11:36:59+00:00 | 32 |
| 2017-05-11 11:56:12+00:00 | 15 |
| 2017-05-11 10:33:34+00:00 | 15 |
| 2017-05-13 16:29:34+00:00 | 13 |
| Other values (492893) |
Length
| Max length | 25 |
|---|---|
| Median length | 25 |
| Mean length | 25 |
| Min length | 25 |
Characters and Unicode
| Total characters | 12517750 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 5 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 486347 ? |
|---|---|
| Unique (%) | 97.1% |
Sample
| 1st row | 2013-09-03 02:22:09+00:00 |
|---|---|
| 2nd row | 2013-09-03 02:22:11+00:00 |
| 3rd row | 2013-09-03 10:11:50+00:00 |
| 4th row | 2013-09-03 11:33:26+00:00 |
| 5th row | 2013-09-03 20:10:51+00:00 |
Common Values
| Value | Count | Frequency (%) |
| 2016-09-05 19:25:39+00:00 | 33 | < 0.1% |
| 2017-01-14 11:36:59+00:00 | 32 | < 0.1% |
| 2017-05-11 11:56:12+00:00 | 15 | < 0.1% |
| 2017-05-11 10:33:34+00:00 | 15 | < 0.1% |
| 2017-05-13 16:29:34+00:00 | 13 | < 0.1% |
| 2016-05-19 12:36:29+00:00 | 11 | < 0.1% |
| 2017-05-14 18:23:36+00:00 | 10 | < 0.1% |
| 2017-05-14 03:22:09+00:00 | 10 | < 0.1% |
| 2017-05-14 05:13:09+00:00 | 10 | < 0.1% |
| 2014-09-16 09:07:16+00:00 | 10 | < 0.1% |
| Other values (492888) | 500551 |
Length
| Value | Count | Frequency (%) |
| 2017-05-14 | 9400 | 0.9% |
| 2017-05-15 | 7029 | 0.7% |
| 2019-04-26 | 4702 | 0.5% |
| 2019-05-12 | 4515 | 0.5% |
| 2017-05-13 | 3977 | 0.4% |
| 2017-05-12 | 3728 | 0.4% |
| 2017-05-16 | 3503 | 0.3% |
| 2019-04-27 | 2967 | 0.3% |
| 2019-04-25 | 2611 | 0.3% |
| 2019-05-18 | 2278 | 0.2% |
| Other values (88510) | 956710 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 3792242 | |
| : | 1502130 | 12.0% |
| 1 | 1389842 | 11.1% |
| 2 | 1276079 | 10.2% |
| - | 1001420 | 8.0% |
| 500710 | 4.0% | |
| + | 500710 | 4.0% |
| 5 | 462279 | 3.7% |
| 3 | 434866 | 3.5% |
| 4 | 414797 | 3.3% |
| Other values (4) | 1242675 | 9.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9012780 | |
| Other Punctuation | 1502130 | 12.0% |
| Dash Punctuation | 1001420 | 8.0% |
| Space Separator | 500710 | 4.0% |
| Math Symbol | 500710 | 4.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 3792242 | |
| 1 | 1389842 | 15.4% |
| 2 | 1276079 | 14.2% |
| 5 | 462279 | 5.1% |
| 3 | 434866 | 4.8% |
| 4 | 414797 | 4.6% |
| 7 | 331685 | 3.7% |
| 9 | 330297 | 3.7% |
| 8 | 315500 | 3.5% |
| 6 | 265193 | 2.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 1502130 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1001420 |
Space Separator
| Value | Count | Frequency (%) |
| 500710 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 500710 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 12517750 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 3792242 | |
| : | 1502130 | 12.0% |
| 1 | 1389842 | 11.1% |
| 2 | 1276079 | 10.2% |
| - | 1001420 | 8.0% |
| 500710 | 4.0% | |
| + | 500710 | 4.0% |
| 5 | 462279 | 3.7% |
| 3 | 434866 | 3.5% |
| 4 | 414797 | 3.3% |
| Other values (4) | 1242675 | 9.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12517750 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 3792242 | |
| : | 1502130 | 12.0% |
| 1 | 1389842 | 11.1% |
| 2 | 1276079 | 10.2% |
| - | 1001420 | 8.0% |
| 500710 | 4.0% | |
| + | 500710 | 4.0% |
| 5 | 462279 | 3.7% |
| 3 | 434866 | 3.5% |
| 4 | 414797 | 3.3% |
| Other values (4) | 1242675 | 9.9% |
tweet_id
Real number (ℝ)
| Distinct | 500710 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0445043 × 1018 |
| Minimum | 3.7471893 × 1017 |
|---|---|
| Maximum | 1.4654691 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 3.7471893 × 1017 |
|---|---|
| 5-th percentile | 6.222359 × 1017 |
| Q1 | 8.6592314 × 1017 |
| median | 1.0553803 × 1018 |
| Q3 | 1.2169744 × 1018 |
| 95-th percentile | 1.4099342 × 1018 |
| Maximum | 1.4654691 × 1018 |
| Range | 1.0907502 × 1018 |
| Interquartile range (IQR) | 3.5105126 × 1017 |
Descriptive statistics
| Standard deviation | 2.3015116 × 1017 |
|---|---|
| Coefficient of variation (CV) | 0.22034487 |
| Kurtosis | -0.46774291 |
| Mean | 1.0445043 × 1018 |
| Median Absolute Deviation (MAD) | 1.8221344 × 1017 |
| Skewness | -0.23110444 |
| Sum | -8.3502747 × 1018 |
| Variance | 5.2969558 × 1034 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3.747189287 × 1017 | 1 | < 0.1% |
| 1.134006431 × 1018 | 1 | < 0.1% |
| 1.133995094 × 1018 | 1 | < 0.1% |
| 1.133993357 × 1018 | 1 | < 0.1% |
| 1.133992337 × 1018 | 1 | < 0.1% |
| 1.133991685 × 1018 | 1 | < 0.1% |
| 1.133990799 × 1018 | 1 | < 0.1% |
| 1.133990297 × 1018 | 1 | < 0.1% |
| 1.133983426 × 1018 | 1 | < 0.1% |
| 1.133980361 × 1018 | 1 | < 0.1% |
| Other values (500700) | 500700 |
| Value | Count | Frequency (%) |
| 3.747189287 × 1017 | 1 | |
| 3.747189379 × 1017 | 1 | |
| 3.748371279 × 1017 | 1 | |
| 3.748576657 × 1017 | 1 | |
| 3.749878767 × 1017 | 1 | |
| 3.749929245 × 1017 | 1 | |
| 3.750220232 × 1017 | 1 | |
| 3.750380101 × 1017 | 1 | |
| 3.750467486 × 1017 | 1 | |
| 3.750500781 × 1017 | 1 |
| Value | Count | Frequency (%) |
| 1.465469131 × 1018 | 1 | |
| 1.465467688 × 1018 | 1 | |
| 1.46546752 × 1018 | 1 | |
| 1.46546751 × 1018 | 1 | |
| 1.465465214 × 1018 | 1 | |
| 1.465464424 × 1018 | 1 | |
| 1.465458763 × 1018 | 1 | |
| 1.465458264 × 1018 | 1 | |
| 1.465454732 × 1018 | 1 | |
| 1.465452411 × 1018 | 1 |
sentiment_polarity
Real number (ℝ)
| Distinct | 10228 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 49 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.10509341 |
| Minimum | -1 |
|---|---|
| Maximum | 0.9992 |
| Zeros | 210275 |
| Zeros (%) | 42.0% |
| Negative | 90044 |
| Negative (%) | 18.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -0.5719 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.3818 |
| 95-th percentile | 0.7269 |
| Maximum | 0.9992 |
| Range | 1.9992 |
| Interquartile range (IQR) | 0.3818 |
Descriptive statistics
| Standard deviation | 0.36287641 |
|---|---|
| Coefficient of variation (CV) | 3.4528942 |
| Kurtosis | 0.1553445 |
| Mean | 0.10509341 |
| Median Absolute Deviation (MAD) | 0.2023 |
| Skewness | -0.13871053 |
| Sum | 52616.169 |
| Variance | 0.13167929 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 210275 | |
| 0.4019 | 12194 | 2.4% |
| 0.3818 | 9643 | 1.9% |
| 0.296 | 6924 | 1.4% |
| 0.2732 | 6375 | 1.3% |
| 0.3612 | 6354 | 1.3% |
| 0.3182 | 6211 | 1.2% |
| 0.4215 | 6174 | 1.2% |
| 0.4404 | 5901 | 1.2% |
| 0.34 | 5894 | 1.2% |
| Other values (10218) | 224716 |
| Value | Count | Frequency (%) |
| -1 | 1 | |
| -0.9989 | 1 | |
| -0.9987 | 1 | |
| -0.9984 | 1 | |
| -0.9977 | 1 | |
| -0.9948 | 1 | |
| -0.9928 | 2 | |
| -0.9897 | 1 | |
| -0.9896 | 1 | |
| -0.9883 | 1 |
| Value | Count | Frequency (%) |
| 0.9992 | 1 | < 0.1% |
| 0.9985 | 1 | < 0.1% |
| 0.9982 | 1 | < 0.1% |
| 0.998 | 1 | < 0.1% |
| 0.9979 | 1 | < 0.1% |
| 0.997 | 1 | < 0.1% |
| 0.9965 | 1 | < 0.1% |
| 0.9964 | 1 | < 0.1% |
| 0.9963 | 3 | |
| 0.9955 | 1 | < 0.1% |
text_lang_ft
Categorical
| Distinct | 2648 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 49 |
| Missing (%) | < 0.1% |
| Memory size | 29.6 MiB |
| en 91 | 15340 |
|---|---|
| en 90 | 15193 |
| en 92 | 14877 |
| en 89 | 14795 |
| en 93 | 14699 |
| Other values (2643) |
Length
| Max length | 6 |
|---|---|
| Median length | 5 |
| Mean length | 5.0003795 |
| Min length | 4 |
Characters and Unicode
| Total characters | 2503495 |
|---|---|
| Distinct characters | 36 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 853 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | en 88 |
|---|---|
| 2nd row | en 84 |
| 3rd row | en 47 |
| 4th row | en 65 |
| 5th row | en 56 |
Common Values
| Value | Count | Frequency (%) |
| en 91 | 15340 | 3.1% |
| en 90 | 15193 | 3.0% |
| en 92 | 14877 | 3.0% |
| en 89 | 14795 | 3.0% |
| en 93 | 14699 | 2.9% |
| en 88 | 14358 | 2.9% |
| en 94 | 14280 | 2.9% |
| en 87 | 13998 | 2.8% |
| en 86 | 13738 | 2.7% |
| en 85 | 13246 | 2.6% |
| Other values (2638) | 356137 |
Length
| Value | Count | Frequency (%) |
| en | 462168 | |
| 91 | 16041 | 1.6% |
| 90 | 15819 | 1.6% |
| 92 | 15661 | 1.6% |
| 93 | 15551 | 1.6% |
| 89 | 15356 | 1.5% |
| 94 | 15234 | 1.5% |
| 88 | 14877 | 1.5% |
| 87 | 14559 | 1.5% |
| 86 | 14255 | 1.4% |
| Other values (173) | 401801 |
Most occurring characters
| Value | Count | Frequency (%) |
| 500661 | ||
| e | 468231 | |
| n | 463210 | |
| 8 | 186528 | 7.5% |
| 9 | 173046 | 6.9% |
| 7 | 150633 | 6.0% |
| 6 | 115737 | 4.6% |
| 5 | 89808 | 3.6% |
| 4 | 72826 | 2.9% |
| 3 | 62350 | 2.5% |
| Other values (26) | 220465 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1001496 | |
| Lowercase Letter | 1001338 | |
| Space Separator | 500661 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 468231 | |
| n | 463210 | |
| i | 16113 | 1.6% |
| d | 12273 | 1.2% |
| t | 11898 | 1.2% |
| r | 7261 | 0.7% |
| s | 4175 | 0.4% |
| a | 2734 | 0.3% |
| h | 2665 | 0.3% |
| f | 2473 | 0.2% |
| Other values (15) | 10305 | 1.0% |
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 186528 | |
| 9 | 173046 | |
| 7 | 150633 | |
| 6 | 115737 | |
| 5 | 89808 | |
| 4 | 72826 | 7.3% |
| 3 | 62350 | 6.2% |
| 2 | 53843 | 5.4% |
| 1 | 49634 | 5.0% |
| 0 | 47091 | 4.7% |
Space Separator
| Value | Count | Frequency (%) |
| 500661 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1502157 | |
| Latin | 1001338 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 468231 | |
| n | 463210 | |
| i | 16113 | 1.6% |
| d | 12273 | 1.2% |
| t | 11898 | 1.2% |
| r | 7261 | 0.7% |
| s | 4175 | 0.4% |
| a | 2734 | 0.3% |
| h | 2665 | 0.3% |
| f | 2473 | 0.2% |
| Other values (15) | 10305 | 1.0% |
Common
| Value | Count | Frequency (%) |
| 500661 | ||
| 8 | 186528 | 12.4% |
| 9 | 173046 | 11.5% |
| 7 | 150633 | 10.0% |
| 6 | 115737 | 7.7% |
| 5 | 89808 | 6.0% |
| 4 | 72826 | 4.8% |
| 3 | 62350 | 4.2% |
| 2 | 53843 | 3.6% |
| 1 | 49634 | 3.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2503495 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 500661 | ||
| e | 468231 | |
| n | 463210 | |
| 8 | 186528 | 7.5% |
| 9 | 173046 | 6.9% |
| 7 | 150633 | 6.0% |
| 6 | 115737 | 4.6% |
| 5 | 89808 | 3.6% |
| 4 | 72826 | 2.9% |
| 3 | 62350 | 2.5% |
| Other values (26) | 220465 |
text_normalized
Categorical
| Distinct | 355734 |
|---|---|
| Distinct (%) | 71.1% |
| Missing | 49 |
| Missing (%) | < 0.1% |
| Memory size | 93.9 MiB |
| ['china', 'new', 'silk', 'road'] | 833 |
|---|---|
| ['one', 'belt', 'one', 'road'] | 726 |
| ['belt', 'road'] | 542 |
| ['new', 'silk', 'road'] | 523 |
| ['beltandroad'] | 422 |
| Other values (355729) |
Length
| Max length | 2853 |
|---|---|
| Median length | 554 |
| Mean length | 135.6264 |
| Min length | 2 |
Characters and Unicode
| Total characters | 67902849 |
|---|---|
| Distinct characters | 3906 |
| Distinct categories | 12 ? |
| Distinct scripts | 32 ? |
| Distinct blocks | 41 ? |
Unique
| Unique | 319884 ? |
|---|---|
| Unique (%) | 63.9% |
Sample
| 1st row | ['nation', 'agree', 'build', 'new', 'silk', 'road', 'china', 'enhance', 'partnership', 'neighbor', 'west', 'aim'] |
|---|---|
| 2nd row | ['nation', 'agree', 'build', 'new', 'silk', 'road', 'china', 'enhance', 'partnership', 'neighbor', 'west', 'aim'] |
| 3rd row | ['high', 'speed', 'rail', 'china', 'new', 'silk', 'road', 'perspective'] |
| 4th row | ['nation', 'agree', 'build', 'new', 'silk', 'road'] |
| 5th row | ['china', 'kazakhstan', 'tajikistan', 'russia', 'mongolia', 'build', 'new', 'silk', 'road'] |
Common Values
| Value | Count | Frequency (%) |
| ['china', 'new', 'silk', 'road'] | 833 | 0.2% |
| ['one', 'belt', 'one', 'road'] | 726 | 0.1% |
| ['belt', 'road'] | 542 | 0.1% |
| ['new', 'silk', 'road'] | 523 | 0.1% |
| ['beltandroad'] | 422 | 0.1% |
| ['belt', 'road', 'initiative'] | 377 | 0.1% |
| [] | 365 | 0.1% |
| ['chinas', 'belt', 'road', 'plan', 'pakistan', 'take', 'military', 'turn'] | 362 | 0.1% |
| ['china', 'invest', '124bn', 'belt', 'road', 'global', 'trade', 'project'] | 289 | 0.1% |
| ['china', '900', 'billion', 'new', 'silk', 'road', 'need', 'know'] | 287 | 0.1% |
| Other values (355724) | 495935 |
Length
| Value | Count | Frequency (%) |
| road | 431688 | 6.3% |
| belt | 328416 | 4.8% |
| china | 271855 | 4.0% |
| one | 156334 | 2.3% |
| initiative | 140139 | 2.0% |
| new | 104540 | 1.5% |
| silk | 104467 | 1.5% |
| beltandroad | 83517 | 1.2% |
| project | 49189 | 0.7% |
| chinese | 47783 | 0.7% |
| Other values (182788) | 5137511 |
Most occurring characters
| Value | Count | Frequency (%) |
| ' | 13710148 | |
| , | 6354778 | 9.4% |
| 6354778 | 9.4% | |
| e | 4371211 | 6.4% |
| a | 3895533 | 5.7% |
| i | 3675622 | 5.4% |
| n | 3272107 | 4.8% |
| t | 3061379 | 4.5% |
| o | 2972794 | 4.4% |
| r | 2874467 | 4.2% |
| Other values (3896) | 17360032 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 39962011 | |
| Other Punctuation | 20064926 | |
| Space Separator | 6354778 | 9.4% |
| Open Punctuation | 500661 | 0.7% |
| Close Punctuation | 500661 | 0.7% |
| Decimal Number | 384214 | 0.6% |
| Other Letter | 115550 | 0.2% |
| Connector Punctuation | 10055 | < 0.1% |
| Uppercase Letter | 9590 | < 0.1% |
| Modifier Letter | 190 | < 0.1% |
| Other values (2) | 213 | < 0.1% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 一 | 3932 | 3.4% |
| ا | 3725 | 3.2% |
| ی | 2520 | 2.2% |
| า | 2220 | 1.9% |
| ر | 2204 | 1.9% |
| 路 | 2070 | 1.8% |
| น | 1951 | 1.7% |
| ن | 1733 | 1.5% |
| ร | 1709 | 1.5% |
| 带 | 1645 | 1.4% |
| Other values (3391) | 91841 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 4371211 | |
| a | 3895533 | 9.7% |
| i | 3675622 | 9.2% |
| n | 3272107 | 8.2% |
| t | 3061379 | 7.7% |
| o | 2972794 | 7.4% |
| r | 2874467 | 7.2% |
| l | 2027982 | 5.1% |
| s | 1833421 | 4.6% |
| c | 1763768 | 4.4% |
| Other values (323) | 10213727 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 80799 | |
| 0 | 79994 | |
| 2 | 77291 | |
| 3 | 26095 | 6.8% |
| 9 | 24618 | 6.4% |
| 5 | 22324 | 5.8% |
| 7 | 20952 | 5.5% |
| 4 | 19823 | 5.2% |
| 8 | 16127 | 4.2% |
| 6 | 15336 | 4.0% |
| Other values (68) | 855 | 0.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| I | 3708 | |
| 𝗔 | 889 | 9.3% |
| 𝗧 | 716 | 7.5% |
| 𝗖 | 532 | 5.5% |
| 𝗬 | 489 | 5.1% |
| 𝗦 | 485 | 5.1% |
| 𝗢 | 484 | 5.0% |
| 𝗣 | 468 | 4.9% |
| 𝗨 | 288 | 3.0% |
| 𝗗 | 282 | 2.9% |
| Other values (47) | 1249 | 13.0% |
Other Number
| Value | Count | Frequency (%) |
| ⒍ | 11 | 10.1% |
| ⒎ | 11 | 10.1% |
| ⒏ | 11 | 10.1% |
| ⒐ | 11 | 10.1% |
| ⒑ | 11 | 10.1% |
| ² | 5 | 4.6% |
| ① | 5 | 4.6% |
| ½ | 4 | 3.7% |
| ③ | 4 | 3.7% |
| ④ | 4 | 3.7% |
| Other values (14) | 32 |
Modifier Letter
| Value | Count | Frequency (%) |
| ー | 135 | |
| ๆ | 48 | 25.3% |
| 々 | 3 | 1.6% |
| ʽ | 2 | 1.1% |
| ˈ | 1 | 0.5% |
| ໆ | 1 | 0.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 13710148 | |
| , | 6354778 |
Space Separator
| Value | Count | Frequency (%) |
| 6354778 |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 500661 |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 500661 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 10055 |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ̇ | 104 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 39917656 | |
| Common | 27825292 | |
| Greek | 34183 | 0.1% |
| Han | 31677 | < 0.1% |
| Arabic | 26015 | < 0.1% |
| Thai | 24318 | < 0.1% |
| Cyrillic | 9420 | < 0.1% |
| Devanagari | 8179 | < 0.1% |
| Myanmar | 7687 | < 0.1% |
| Hebrew | 4320 | < 0.1% |
| Other values (22) | 14102 | < 0.1% |
Most frequent character per script
Han
| Value | Count | Frequency (%) |
| 一 | 3932 | 12.4% |
| 路 | 2070 | 6.5% |
| 带 | 1645 | 5.2% |
| 中 | 761 | 2.4% |
| 国 | 753 | 2.4% |
| 的 | 458 | 1.4% |
| 自 | 391 | 1.2% |
| 来 | 304 | 1.0% |
| 新 | 247 | 0.8% |
| 帶 | 196 | 0.6% |
| Other values (2331) | 20920 |
Common
| Value | Count | Frequency (%) |
| ' | 13710148 | |
| , | 6354778 | |
| 6354778 | ||
| [ | 500661 | 1.8% |
| ] | 500661 | 1.8% |
| 1 | 80799 | 0.3% |
| 0 | 79994 | 0.3% |
| 2 | 77291 | 0.3% |
| 3 | 26095 | 0.1% |
| 9 | 24618 | 0.1% |
| Other values (198) | 115469 | 0.4% |
Arabic
| Value | Count | Frequency (%) |
| ا | 3725 | |
| ی | 2520 | 9.7% |
| ر | 2204 | 8.5% |
| ن | 1733 | 6.7% |
| و | 1484 | 5.7% |
| ه | 1292 | 5.0% |
| د | 1277 | 4.9% |
| م | 1239 | 4.8% |
| ت | 1235 | 4.7% |
| ب | 992 | 3.8% |
| Other values (141) | 8314 |
Latin
| Value | Count | Frequency (%) |
| e | 4371211 | |
| a | 3895533 | 9.8% |
| i | 3675622 | 9.2% |
| n | 3272107 | 8.2% |
| t | 3061379 | 7.7% |
| o | 2972794 | 7.4% |
| r | 2874467 | 7.2% |
| l | 2027982 | 5.1% |
| s | 1833421 | 4.6% |
| c | 1763768 | 4.4% |
| Other values (137) | 10169372 |
Hangul
| Value | Count | Frequency (%) |
| 공 | 47 | 9.3% |
| 이 | 43 | 8.5% |
| 유 | 39 | 7.7% |
| 님 | 37 | 7.3% |
| 행 | 14 | 2.8% |
| 여 | 14 | 2.8% |
| 출 | 13 | 2.6% |
| 처 | 13 | 2.6% |
| 티 | 12 | 2.4% |
| 켓 | 10 | 2.0% |
| Other values (124) | 264 |
Ethiopic
| Value | Count | Frequency (%) |
| ን | 35 | 5.5% |
| ት | 35 | 5.5% |
| ይ | 28 | 4.4% |
| ስ | 23 | 3.6% |
| ው | 20 | 3.1% |
| ና | 19 | 3.0% |
| ር | 18 | 2.8% |
| ም | 17 | 2.7% |
| አ | 17 | 2.7% |
| ቻ | 16 | 2.5% |
| Other values (109) | 407 |
Katakana
| Value | Count | Frequency (%) |
| イ | 53 | 8.3% |
| ス | 53 | 8.3% |
| ル | 50 | 7.8% |
| コ | 43 | 6.7% |
| ロ | 43 | 6.7% |
| ウ | 38 | 5.9% |
| ナ | 37 | 5.8% |
| ン | 27 | 4.2% |
| ア | 17 | 2.7% |
| ジ | 12 | 1.9% |
| Other values (63) | 268 |
Hiragana
| Value | Count | Frequency (%) |
| の | 69 | 10.9% |
| で | 43 | 6.8% |
| が | 36 | 5.7% |
| は | 34 | 5.4% |
| に | 29 | 4.6% |
| い | 28 | 4.4% |
| と | 26 | 4.1% |
| る | 26 | 4.1% |
| す | 20 | 3.1% |
| を | 20 | 3.1% |
| Other values (53) | 304 |
Myanmar
| Value | Count | Frequency (%) |
| င | 858 | |
| တ | 785 | 10.2% |
| က | 724 | 9.4% |
| မ | 676 | 8.8% |
| န | 533 | 6.9% |
| ပ | 514 | 6.7% |
| ရ | 457 | 5.9% |
| စ | 374 | 4.9% |
| အ | 358 | 4.7% |
| သ | 284 | 3.7% |
| Other values (49) | 2124 |
Thai
| Value | Count | Frequency (%) |
| า | 2220 | 9.1% |
| น | 1951 | 8.0% |
| ร | 1709 | 7.0% |
| ง | 1306 | 5.4% |
| เ | 1227 | 5.0% |
| ก | 1198 | 4.9% |
| ม | 1124 | 4.6% |
| อ | 1117 | 4.6% |
| ย | 1010 | 4.2% |
| ท | 916 | 3.8% |
| Other values (44) | 10540 |
Devanagari
| Value | Count | Frequency (%) |
| र | 958 | 11.7% |
| न | 927 | 11.3% |
| क | 737 | 9.0% |
| म | 462 | 5.6% |
| स | 455 | 5.6% |
| ल | 438 | 5.4% |
| त | 374 | 4.6% |
| प | 357 | 4.4% |
| य | 286 | 3.5% |
| ह | 283 | 3.5% |
| Other values (43) | 2902 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 1078 | 11.4% |
| н | 685 | 7.3% |
| о | 557 | 5.9% |
| э | 511 | 5.4% |
| р | 495 | 5.3% |
| л | 481 | 5.1% |
| и | 476 | 5.1% |
| д | 440 | 4.7% |
| г | 429 | 4.6% |
| т | 397 | 4.2% |
| Other values (41) | 3871 |
Khmer
| Value | Count | Frequency (%) |
| រ | 117 | 10.5% |
| ក | 100 | 9.0% |
| ន | 89 | 8.0% |
| ម | 81 | 7.3% |
| ត | 72 | 6.5% |
| ស | 63 | 5.7% |
| ល | 60 | 5.4% |
| ប | 60 | 5.4% |
| ជ | 57 | 5.1% |
| ព | 45 | 4.0% |
| Other values (33) | 371 |
Sinhala
| Value | Count | Frequency (%) |
| න | 474 | |
| ව | 353 | 9.0% |
| ක | 325 | 8.3% |
| ය | 316 | 8.1% |
| ර | 287 | 7.3% |
| ත | 268 | 6.9% |
| ම | 266 | 6.8% |
| ස | 200 | 5.1% |
| ල | 158 | 4.0% |
| ප | 152 | 3.9% |
| Other values (31) | 1107 |
Lao
| Value | Count | Frequency (%) |
| າ | 134 | 11.3% |
| ນ | 116 | 9.8% |
| ງ | 84 | 7.1% |
| ລ | 66 | 5.6% |
| ກ | 63 | 5.3% |
| ທ | 61 | 5.1% |
| ເ | 60 | 5.1% |
| ສ | 57 | 4.8% |
| ວ | 56 | 4.7% |
| ດ | 44 | 3.7% |
| Other values (30) | 446 |
Greek
| Value | Count | Frequency (%) |
| α | 3120 | 9.1% |
| ο | 3006 | 8.8% |
| τ | 2797 | 8.2% |
| ι | 2224 | 6.5% |
| ε | 2199 | 6.4% |
| ν | 1945 | 5.7% |
| σ | 1725 | 5.0% |
| ρ | 1606 | 4.7% |
| η | 1482 | 4.3% |
| ς | 1339 | 3.9% |
| Other values (25) | 12740 |
Bengali
| Value | Count | Frequency (%) |
| ন | 28 | 10.8% |
| ৰ | 24 | 9.3% |
| ব | 19 | 7.3% |
| ত | 19 | 7.3% |
| ক | 18 | 6.9% |
| ল | 16 | 6.2% |
| য | 15 | 5.8% |
| ম | 12 | 4.6% |
| শ | 10 | 3.9% |
| চ | 9 | 3.5% |
| Other values (25) | 89 |
Tamil
| Value | Count | Frequency (%) |
| த | 499 | |
| க | 483 | |
| ட | 369 | |
| ன | 290 | 7.7% |
| ப | 230 | 6.1% |
| ம | 222 | 5.9% |
| ல | 210 | 5.5% |
| வ | 200 | 5.3% |
| ச | 193 | 5.1% |
| ர | 183 | 4.8% |
| Other values (24) | 905 |
Oriya
| Value | Count | Frequency (%) |
| ର | 32 | |
| ନ | 16 | 8.8% |
| ବ | 13 | 7.1% |
| କ | 10 | 5.5% |
| ପ | 10 | 5.5% |
| ଇ | 10 | 5.5% |
| ତ | 7 | 3.8% |
| ହ | 6 | 3.3% |
| ଜ | 5 | 2.7% |
| ଦ | 5 | 2.7% |
| Other values (24) | 68 |
Gujarati
| Value | Count | Frequency (%) |
| ન | 56 | |
| ર | 34 | 11.4% |
| મ | 21 | 7.0% |
| લ | 18 | 6.0% |
| વ | 18 | 6.0% |
| ક | 14 | 4.7% |
| ટ | 13 | 4.4% |
| જ | 11 | 3.7% |
| સ | 10 | 3.4% |
| ડ | 9 | 3.0% |
| Other values (20) | 94 |
Kannada
| Value | Count | Frequency (%) |
| ರ | 36 | 12.4% |
| ನ | 28 | 9.6% |
| ಲ | 25 | 8.6% |
| ತ | 20 | 6.9% |
| ದ | 15 | 5.2% |
| ಕ | 13 | 4.5% |
| ಯ | 12 | 4.1% |
| ಪ | 12 | 4.1% |
| ಗ | 12 | 4.1% |
| ಸ | 12 | 4.1% |
| Other values (20) | 106 |
Tibetan
| Value | Count | Frequency (%) |
| ག | 42 | |
| ས | 24 | |
| ར | 21 | |
| ད | 20 | |
| ལ | 11 | 5.5% |
| འ | 11 | 5.5% |
| མ | 10 | 5.0% |
| ཅ | 9 | 4.5% |
| བ | 9 | 4.5% |
| ན | 7 | 3.5% |
| Other values (18) | 37 |
Hebrew
| Value | Count | Frequency (%) |
| י | 527 | 12.2% |
| ו | 411 | 9.5% |
| ה | 398 | 9.2% |
| ל | 298 | 6.9% |
| ת | 243 | 5.6% |
| ר | 238 | 5.5% |
| מ | 238 | 5.5% |
| ש | 223 | 5.2% |
| א | 218 | 5.0% |
| ב | 191 | 4.4% |
| Other values (17) | 1335 |
Telugu
| Value | Count | Frequency (%) |
| ర | 35 | |
| న | 32 | |
| ల | 21 | 8.0% |
| మ | 18 | 6.9% |
| చ | 17 | 6.5% |
| క | 15 | 5.7% |
| వ | 13 | 5.0% |
| ద | 13 | 5.0% |
| స | 12 | 4.6% |
| బ | 10 | 3.8% |
| Other values (15) | 76 |
Syloti_Nagri
| Value | Count | Frequency (%) |
| ꠞ | 6 | |
| ꠟ | 4 | |
| ꠘ | 3 | |
| ꠛ | 2 | 6.1% |
| ꠖ | 2 | 6.1% |
| ꠡ | 2 | 6.1% |
| ꠌ | 2 | 6.1% |
| ꠝ | 2 | 6.1% |
| ꠐ | 2 | 6.1% |
| ꠚ | 1 | 3.0% |
| Other values (7) | 7 |
Canadian_Aboriginal
| Value | Count | Frequency (%) |
| ᖇ | 3 | |
| ᑌ | 3 | |
| ᖗ | 2 | 8.7% |
| ᖘ | 2 | 8.7% |
| ᗩ | 2 | 8.7% |
| ᔕ | 2 | 8.7% |
| ᑎ | 1 | 4.3% |
| ᕼ | 1 | 4.3% |
| ᗪ | 1 | 4.3% |
| ᑭ | 1 | 4.3% |
| Other values (5) | 5 |
Mongolian
| Value | Count | Frequency (%) |
| ᡝ | 3 | |
| ᠴ | 2 | |
| ᠵ | 1 | 12.5% |
| ᡳ | 1 | 12.5% |
| ᠨ | 1 | 12.5% |
Georgian
| Value | Count | Frequency (%) |
| ღ | 2 | |
| ი | 2 | |
| ნ | 1 | |
| ს | 1 | |
| ა | 1 |
Armenian
| Value | Count | Frequency (%) |
| է | 12 | |
| ե | 2 | 12.5% |
| հ | 1 | 6.2% |
| պ | 1 | 6.2% |
Malayalam
| Value | Count | Frequency (%) |
| മ | 1 | |
| ഖ | 1 | |
| ന | 1 |
Inherited
| Value | Count | Frequency (%) |
| ̇ | 104 |
Bopomofo
| Value | Count | Frequency (%) |
| ㄧ | 6 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 67717532 | |
| None | 49016 | 0.1% |
| CJK | 31673 | < 0.1% |
| Arabic | 25710 | < 0.1% |
| Thai | 24318 | < 0.1% |
| Math Alphanum | 10535 | < 0.1% |
| Cyrillic | 9420 | < 0.1% |
| Devanagari | 8179 | < 0.1% |
| Myanmar | 7687 | < 0.1% |
| Hebrew | 4320 | < 0.1% |
| Other values (31) | 14459 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| ' | 13710148 | |
| , | 6354778 | 9.4% |
| 6354778 | 9.4% | |
| e | 4371211 | 6.5% |
| a | 3895533 | 5.8% |
| i | 3675622 | 5.4% |
| n | 3272107 | 4.8% |
| t | 3061379 | 4.5% |
| o | 2972794 | 4.4% |
| r | 2874467 | 4.2% |
| Other values (33) | 17174715 |
CJK
| Value | Count | Frequency (%) |
| 一 | 3932 | 12.4% |
| 路 | 2070 | 6.5% |
| 带 | 1645 | 5.2% |
| 中 | 761 | 2.4% |
| 国 | 753 | 2.4% |
| 的 | 458 | 1.4% |
| 自 | 391 | 1.2% |
| 来 | 304 | 1.0% |
| 新 | 247 | 0.8% |
| 帶 | 196 | 0.6% |
| Other values (2329) | 20916 |
Arabic
| Value | Count | Frequency (%) |
| ا | 3725 | |
| ی | 2520 | 9.8% |
| ر | 2204 | 8.6% |
| ن | 1733 | 6.7% |
| و | 1484 | 5.8% |
| ه | 1292 | 5.0% |
| د | 1277 | 5.0% |
| م | 1239 | 4.8% |
| ت | 1235 | 4.8% |
| ب | 992 | 3.9% |
| Other values (69) | 8009 |
None
| Value | Count | Frequency (%) |
| α | 3120 | 6.4% |
| ο | 3006 | 6.1% |
| τ | 2797 | 5.7% |
| ι | 2224 | 4.5% |
| ε | 2199 | 4.5% |
| ν | 1945 | 4.0% |
| σ | 1725 | 3.5% |
| í | 1642 | 3.3% |
| ã | 1619 | 3.3% |
| é | 1615 | 3.3% |
| Other values (209) | 27124 |
Thai
| Value | Count | Frequency (%) |
| า | 2220 | 9.1% |
| น | 1951 | 8.0% |
| ร | 1709 | 7.0% |
| ง | 1306 | 5.4% |
| เ | 1227 | 5.0% |
| ก | 1198 | 4.9% |
| ม | 1124 | 4.6% |
| อ | 1117 | 4.6% |
| ย | 1010 | 4.2% |
| ท | 916 | 3.8% |
| Other values (44) | 10540 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 1078 | 11.4% |
| н | 685 | 7.3% |
| о | 557 | 5.9% |
| э | 511 | 5.4% |
| р | 495 | 5.3% |
| л | 481 | 5.1% |
| и | 476 | 5.1% |
| д | 440 | 4.7% |
| г | 429 | 4.6% |
| т | 397 | 4.2% |
| Other values (41) | 3871 |
Devanagari
| Value | Count | Frequency (%) |
| र | 958 | 11.7% |
| न | 927 | 11.3% |
| क | 737 | 9.0% |
| म | 462 | 5.6% |
| स | 455 | 5.6% |
| ल | 438 | 5.4% |
| त | 374 | 4.6% |
| प | 357 | 4.4% |
| य | 286 | 3.5% |
| ह | 283 | 3.5% |
| Other values (43) | 2902 |
Math Alphanum
| Value | Count | Frequency (%) |
| 𝗔 | 889 | 8.4% |
| 𝗧 | 716 | 6.8% |
| 𝗲 | 682 | 6.5% |
| 𝘀 | 666 | 6.3% |
| 𝗖 | 532 | 5.0% |
| 𝗬 | 489 | 4.6% |
| 𝗦 | 485 | 4.6% |
| 𝗢 | 484 | 4.6% |
| 𝗣 | 468 | 4.4% |
| 𝗮 | 376 | 3.6% |
| Other values (144) | 4748 |
Myanmar
| Value | Count | Frequency (%) |
| င | 858 | |
| တ | 785 | 10.2% |
| က | 724 | 9.4% |
| မ | 676 | 8.8% |
| န | 533 | 6.9% |
| ပ | 514 | 6.7% |
| ရ | 457 | 5.9% |
| စ | 374 | 4.9% |
| အ | 358 | 4.7% |
| သ | 284 | 3.7% |
| Other values (49) | 2124 |
Hebrew
| Value | Count | Frequency (%) |
| י | 527 | 12.2% |
| ו | 411 | 9.5% |
| ה | 398 | 9.2% |
| ל | 298 | 6.9% |
| ת | 243 | 5.6% |
| ר | 238 | 5.5% |
| מ | 238 | 5.5% |
| ש | 223 | 5.2% |
| א | 218 | 5.0% |
| ב | 191 | 4.4% |
| Other values (17) | 1335 |
Tamil
| Value | Count | Frequency (%) |
| த | 499 | |
| க | 483 | |
| ட | 369 | |
| ன | 290 | 7.7% |
| ப | 230 | 6.1% |
| ம | 222 | 5.9% |
| ல | 210 | 5.5% |
| வ | 200 | 5.3% |
| ச | 193 | 5.1% |
| ர | 183 | 4.8% |
| Other values (24) | 905 |
Sinhala
| Value | Count | Frequency (%) |
| න | 474 | |
| ව | 353 | 9.0% |
| ක | 325 | 8.3% |
| ය | 316 | 8.1% |
| ර | 287 | 7.3% |
| ත | 268 | 6.9% |
| ම | 266 | 6.8% |
| ස | 200 | 5.1% |
| ල | 158 | 4.0% |
| ප | 152 | 3.9% |
| Other values (31) | 1107 |
Katakana
| Value | Count | Frequency (%) |
| ー | 135 | |
| イ | 53 | 6.9% |
| ス | 53 | 6.9% |
| ル | 50 | 6.5% |
| コ | 43 | 5.6% |
| ロ | 43 | 5.6% |
| ウ | 38 | 4.9% |
| ナ | 37 | 4.8% |
| ン | 27 | 3.5% |
| ア | 17 | 2.2% |
| Other values (59) | 272 |
Lao
| Value | Count | Frequency (%) |
| າ | 134 | 11.3% |
| ນ | 116 | 9.8% |
| ງ | 84 | 7.1% |
| ລ | 66 | 5.6% |
| ກ | 63 | 5.3% |
| ທ | 61 | 5.1% |
| ເ | 60 | 5.1% |
| ສ | 57 | 4.8% |
| ວ | 56 | 4.7% |
| ດ | 44 | 3.7% |
| Other values (30) | 446 |
Khmer
| Value | Count | Frequency (%) |
| រ | 117 | 10.5% |
| ក | 100 | 9.0% |
| ន | 89 | 8.0% |
| ម | 81 | 7.3% |
| ត | 72 | 6.5% |
| ស | 63 | 5.7% |
| ល | 60 | 5.4% |
| ប | 60 | 5.4% |
| ជ | 57 | 5.1% |
| ព | 45 | 4.0% |
| Other values (33) | 371 |
Diacriticals
| Value | Count | Frequency (%) |
| ̇ | 104 |
Hiragana
| Value | Count | Frequency (%) |
| の | 69 | 10.9% |
| で | 43 | 6.8% |
| が | 36 | 5.7% |
| は | 34 | 5.4% |
| に | 29 | 4.6% |
| い | 28 | 4.4% |
| と | 26 | 4.1% |
| る | 26 | 4.1% |
| す | 20 | 3.1% |
| を | 20 | 3.1% |
| Other values (53) | 304 |
Gujarati
| Value | Count | Frequency (%) |
| ન | 56 | |
| ર | 34 | 11.4% |
| મ | 21 | 7.0% |
| લ | 18 | 6.0% |
| વ | 18 | 6.0% |
| ક | 14 | 4.7% |
| ટ | 13 | 4.4% |
| જ | 11 | 3.7% |
| સ | 10 | 3.4% |
| ડ | 9 | 3.0% |
| Other values (20) | 94 |
Hangul
| Value | Count | Frequency (%) |
| 공 | 47 | 9.3% |
| 이 | 43 | 8.5% |
| 유 | 39 | 7.7% |
| 님 | 37 | 7.3% |
| 행 | 14 | 2.8% |
| 여 | 14 | 2.8% |
| 출 | 13 | 2.6% |
| 처 | 13 | 2.6% |
| 티 | 12 | 2.4% |
| 켓 | 10 | 2.0% |
| Other values (124) | 264 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ℑ | 45 | |
| ℹ | 7 | 13.5% |
Tibetan
| Value | Count | Frequency (%) |
| ག | 42 | |
| ས | 24 | |
| ར | 21 | |
| ད | 20 | |
| ལ | 11 | 5.5% |
| འ | 11 | 5.5% |
| མ | 10 | 5.0% |
| ཅ | 9 | 4.5% |
| བ | 9 | 4.5% |
| ན | 7 | 3.5% |
| Other values (18) | 37 |
Kannada
| Value | Count | Frequency (%) |
| ರ | 36 | 12.4% |
| ನ | 28 | 9.6% |
| ಲ | 25 | 8.6% |
| ತ | 20 | 6.9% |
| ದ | 15 | 5.2% |
| ಕ | 13 | 4.5% |
| ಯ | 12 | 4.1% |
| ಪ | 12 | 4.1% |
| ಗ | 12 | 4.1% |
| ಸ | 12 | 4.1% |
| Other values (20) | 106 |
Ethiopic
| Value | Count | Frequency (%) |
| ን | 35 | 5.5% |
| ት | 35 | 5.5% |
| ይ | 28 | 4.4% |
| ስ | 23 | 3.6% |
| ው | 20 | 3.1% |
| ና | 19 | 3.0% |
| ር | 18 | 2.8% |
| ም | 17 | 2.7% |
| አ | 17 | 2.7% |
| ቻ | 16 | 2.5% |
| Other values (109) | 407 |
Telugu
| Value | Count | Frequency (%) |
| ర | 35 | |
| న | 32 | |
| ల | 21 | 8.0% |
| మ | 18 | 6.9% |
| చ | 17 | 6.5% |
| క | 15 | 5.7% |
| వ | 13 | 5.0% |
| ద | 13 | 5.0% |
| స | 12 | 4.6% |
| బ | 10 | 3.8% |
| Other values (15) | 76 |
Oriya
| Value | Count | Frequency (%) |
| ର | 32 | |
| ନ | 16 | 8.8% |
| ବ | 13 | 7.1% |
| କ | 10 | 5.5% |
| ପ | 10 | 5.5% |
| ଇ | 10 | 5.5% |
| ତ | 7 | 3.8% |
| ହ | 6 | 3.3% |
| ଜ | 5 | 2.7% |
| ଦ | 5 | 2.7% |
| Other values (24) | 68 |
Bengali
| Value | Count | Frequency (%) |
| ন | 28 | 10.8% |
| ৰ | 24 | 9.3% |
| ব | 19 | 7.3% |
| ত | 19 | 7.3% |
| ক | 18 | 6.9% |
| ল | 16 | 6.2% |
| য | 15 | 5.8% |
| ম | 12 | 4.6% |
| শ | 10 | 3.9% |
| চ | 9 | 3.5% |
| Other values (25) | 89 |
Latin Ext Additional
| Value | Count | Frequency (%) |
| ế | 18 | |
| ộ | 9 | |
| ồ | 6 | 9.2% |
| ẵ | 6 | 9.2% |
| ệ | 3 | 4.6% |
| ễ | 3 | 4.6% |
| ầ | 3 | 4.6% |
| ớ | 2 | 3.1% |
| ị | 2 | 3.1% |
| ạ | 2 | 3.1% |
| Other values (10) | 11 |
Armenian
| Value | Count | Frequency (%) |
| է | 12 | |
| ե | 2 | 12.5% |
| հ | 1 | 6.2% |
| պ | 1 | 6.2% |
Enclosed Alphanum
| Value | Count | Frequency (%) |
| ⒍ | 11 | |
| ⒎ | 11 | |
| ⒏ | 11 | |
| ⒐ | 11 | |
| ⒑ | 11 | |
| ① | 5 | 5.5% |
| ③ | 4 | 4.4% |
| ④ | 4 | 4.4% |
| ② | 3 | 3.3% |
| ⒈ | 3 | 3.3% |
| Other values (7) | 17 |
Syloti Nagri
| Value | Count | Frequency (%) |
| ꠞ | 6 | |
| ꠟ | 4 | |
| ꠘ | 3 | |
| ꠛ | 2 | 6.1% |
| ꠖ | 2 | 6.1% |
| ꠡ | 2 | 6.1% |
| ꠌ | 2 | 6.1% |
| ꠝ | 2 | 6.1% |
| ꠐ | 2 | 6.1% |
| ꠚ | 1 | 3.0% |
| Other values (7) | 7 |
Bopomofo
| Value | Count | Frequency (%) |
| ㄧ | 6 |
Alphabetic PF
| Value | Count | Frequency (%) |
| fi | 4 |
UCAS
| Value | Count | Frequency (%) |
| ᖇ | 3 | |
| ᑌ | 3 | |
| ᖗ | 2 | 8.7% |
| ᖘ | 2 | 8.7% |
| ᗩ | 2 | 8.7% |
| ᔕ | 2 | 8.7% |
| ᑎ | 1 | 4.3% |
| ᕼ | 1 | 4.3% |
| ᗪ | 1 | 4.3% |
| ᑭ | 1 | 4.3% |
| Other values (5) | 5 |
IPA Ext
| Value | Count | Frequency (%) |
| ə | 3 | |
| ʃ | 1 | 14.3% |
| ʊ | 1 | 14.3% |
| ʌ | 1 | 14.3% |
| ʘ | 1 | 14.3% |
Mongolian
| Value | Count | Frequency (%) |
| ᡝ | 3 | |
| ᠴ | 2 | |
| ᠵ | 1 | 12.5% |
| ᡳ | 1 | 12.5% |
| ᠨ | 1 | 12.5% |
Dingbats
| Value | Count | Frequency (%) |
| ❶ | 2 | |
| ❷ | 2 | |
| ❸ | 2 |
Modifier Letters
| Value | Count | Frequency (%) |
| ʽ | 2 | |
| ˈ | 1 |
Georgian
| Value | Count | Frequency (%) |
| ღ | 2 | |
| ი | 2 | |
| ნ | 1 | |
| ს | 1 | |
| ა | 1 |
Malayalam
| Value | Count | Frequency (%) |
| മ | 1 | |
| ഖ | 1 | |
| ന | 1 |
CJK Ext B
| Value | Count | Frequency (%) |
| 𠯆 | 1 |
Number Forms
| Value | Count | Frequency (%) |
| ⅓ | 1 |
links
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 95588 |
|---|---|
| Distinct (%) | 63.0% |
| Missing | 349025 |
| Missing (%) | 69.7% |
| Memory size | 28.9 MiB |
| https://www.reuters.com/?edition-redirect=uk | 328 |
|---|---|
| https://nyti.ms/2GutBQB | 282 |
| https://www.youtube.com/watch?v=cUxw9Re-Z-E&feature=youtu.be | 194 |
| http://English.news | 190 |
| https://www.bbc.co.uk/news/world-asia-39912671 | 185 |
| Other values (95583) |
Length
| Max length | 1115 |
|---|---|
| Median length | 630 |
| Mean length | 69.141761 |
| Min length | 11 |
Characters and Unicode
| Total characters | 10487768 |
|---|---|
| Distinct characters | 92 |
| Distinct categories | 11 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 80106 ? |
|---|---|
| Unique (%) | 52.8% |
Sample
| 1st row | http://bit.ly/17lyTPM |
|---|---|
| 2nd row | http://bit.ly/17lySv6 |
| 3rd row | http://usa.chinadaily.com.cn/epaper/2013-09/03/content_16940556.htm |
| 4th row | http://usa.chinadaily.com.cn/epaper/2013-09/03/content_16940556.htm |
| 5th row | http://buff.ly/18qBTwC |
Common Values
| Value | Count | Frequency (%) |
| https://www.reuters.com/?edition-redirect=uk | 328 | 0.1% |
| https://nyti.ms/2GutBQB | 282 | 0.1% |
| https://www.youtube.com/watch?v=cUxw9Re-Z-E&feature=youtu.be | 194 | < 0.1% |
| http://English.news | 190 | < 0.1% |
| https://www.bbc.co.uk/news/world-asia-39912671 | 185 | < 0.1% |
| https://www.lavoceditrieste.net/2015/10/31/does-rome-want-to-eliminate-the-european-singapore/ | 177 | < 0.1% |
| https://sc.mp/2rCpF5x | 176 | < 0.1% |
| https://www.reuters.com/?edition-redirect=in | 176 | < 0.1% |
| https://www.lavoceditrieste.net/2015/10/31/roma-vuole-eliminare-la-singapore-deuropa/ | 170 | < 0.1% |
| https://edition.cnn.com/2017/05/13/asia/china-belt-and-road-forum-xi-putin-erdogan/index.html | 162 | < 0.1% |
| Other values (95578) | 149645 | |
| (Missing) | 349025 |
Length
| Value | Count | Frequency (%) |
| https://www.reuters.com/?edition-redirect=uk | 328 | 0.2% |
| https://nyti.ms/2gutbqb | 282 | 0.2% |
| https://www.youtube.com/watch?v=cuxw9re-z-e&feature=youtu.be | 194 | 0.1% |
| http://english.news | 190 | 0.1% |
| https://www.bbc.co.uk/news/world-asia-39912671 | 185 | 0.1% |
| https://www.lavoceditrieste.net/2015/10/31/does-rome-want-to-eliminate-the-european-singapore | 177 | 0.1% |
| https://sc.mp/2rcpf5x | 176 | 0.1% |
| https://www.reuters.com/?edition-redirect=in | 176 | 0.1% |
| https://www.lavoceditrieste.net/2015/10/31/roma-vuole-eliminare-la-singapore-deuropa | 170 | 0.1% |
| https://edition.cnn.com/2017/05/13/asia/china-belt-and-road-forum-xi-putin-erdogan/index.html | 162 | 0.1% |
| Other values (95500) | 149646 |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 830242 | 7.9% |
| / | 631408 | 6.0% |
| e | 537152 | 5.1% |
| i | 440320 | 4.2% |
| o | 424284 | 4.0% |
| s | 422727 | 4.0% |
| - | 418155 | 4.0% |
| a | 411726 | 3.9% |
| n | 353806 | 3.4% |
| c | 323647 | 3.1% |
| Other values (82) | 5694301 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 6875107 | |
| Other Punctuation | 1228186 | 11.7% |
| Decimal Number | 1046914 | 10.0% |
| Uppercase Letter | 707794 | 6.7% |
| Dash Punctuation | 418155 | 4.0% |
| Math Symbol | 136392 | 1.3% |
| Connector Punctuation | 75041 | 0.7% |
| Open Punctuation | 87 | < 0.1% |
| Close Punctuation | 85 | < 0.1% |
| Currency Symbol | 6 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 830242 | 12.1% |
| e | 537152 | 7.8% |
| i | 440320 | 6.4% |
| o | 424284 | 6.2% |
| s | 422727 | 6.1% |
| a | 411726 | 6.0% |
| n | 353806 | 5.1% |
| c | 323647 | 4.7% |
| r | 321209 | 4.7% |
| h | 318879 | 4.6% |
| Other values (17) | 2491115 |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 38900 | 5.5% |
| S | 38296 | 5.4% |
| W | 37600 | 5.3% |
| A | 33851 | 4.8% |
| F | 32353 | 4.6% |
| Y | 32148 | 4.5% |
| R | 32079 | 4.5% |
| C | 31597 | 4.5% |
| Z | 31460 | 4.4% |
| X | 31184 | 4.4% |
| Other values (16) | 368326 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 631408 | |
| . | 282357 | |
| : | 162784 | 13.3% |
| & | 94795 | 7.7% |
| ? | 38465 | 3.1% |
| % | 16946 | 1.4% |
| # | 838 | 0.1% |
| , | 152 | < 0.1% |
| ! | 134 | < 0.1% |
| ' | 107 | < 0.1% |
| Other values (5) | 200 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 164762 | |
| 1 | 157311 | |
| 0 | 153878 | |
| 3 | 93127 | |
| 8 | 88322 | |
| 5 | 81760 | |
| 9 | 79643 | |
| 7 | 78387 | |
| 6 | 75027 | |
| 4 | 74697 |
Math Symbol
| Value | Count | Frequency (%) |
| = | 133503 | |
| + | 2677 | 2.0% |
| | | 191 | 0.1% |
| ~ | 21 | < 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 48 | |
| ( | 35 | |
| { | 4 | 4.6% |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 46 | |
| ) | 35 | |
| } | 4 | 4.7% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 418155 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 75041 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 6 |
Space Separator
| Value | Count | Frequency (%) |
| 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 7582901 | |
| Common | 2904867 | 27.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 830242 | 10.9% |
| e | 537152 | 7.1% |
| i | 440320 | 5.8% |
| o | 424284 | 5.6% |
| s | 422727 | 5.6% |
| a | 411726 | 5.4% |
| n | 353806 | 4.7% |
| c | 323647 | 4.3% |
| r | 321209 | 4.2% |
| h | 318879 | 4.2% |
| Other values (43) | 3198909 |
Common
| Value | Count | Frequency (%) |
| / | 631408 | |
| - | 418155 | |
| . | 282357 | |
| 2 | 164762 | 5.7% |
| : | 162784 | 5.6% |
| 1 | 157311 | 5.4% |
| 0 | 153878 | 5.3% |
| = | 133503 | 4.6% |
| & | 94795 | 3.3% |
| 3 | 93127 | 3.2% |
| Other values (29) | 612787 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10487767 | |
| None | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 830242 | 7.9% |
| / | 631408 | 6.0% |
| e | 537152 | 5.1% |
| i | 440320 | 4.2% |
| o | 424284 | 4.0% |
| s | 422727 | 4.0% |
| - | 418155 | 4.0% |
| a | 411726 | 3.9% |
| n | 353806 | 3.4% |
| c | 323647 | 3.1% |
| Other values (81) | 5694300 |
None
| Value | Count | Frequency (%) |
| ł | 1 |
hashtag
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 93371 |
|---|---|
| Distinct (%) | 49.4% |
| Missing | 311862 |
| Missing (%) | 62.3% |
| Memory size | 25.5 MiB |
| BeltandRoad | 14910 |
|---|---|
| China | 3760 |
| beltandroad | 1538 |
| China BeltandRoad | 1397 |
| OBOR | 1201 |
| Other values (93366) |
Length
| Max length | 257 |
|---|---|
| Median length | 237 |
| Mean length | 29.637402 |
| Min length | 1 |
Characters and Unicode
| Total characters | 5596964 |
|---|---|
| Distinct characters | 1909 |
| Distinct categories | 13 ? |
| Distinct scripts | 24 ? |
| Distinct blocks | 28 ? |
Unique
| Unique | 80452 ? |
|---|---|
| Unique (%) | 42.6% |
Sample
| 1st row | China |
|---|---|
| 2nd row | Asia energy NewSilkRoad |
| 3rd row | china asia energy |
| 4th row | China |
| 5th row | Mongolia |
Common Values
| Value | Count | Frequency (%) |
| BeltandRoad | 14910 | 3.0% |
| China | 3760 | 0.8% |
| beltandroad | 1538 | 0.3% |
| China BeltandRoad | 1397 | 0.3% |
| OBOR | 1201 | 0.2% |
| BRI | 1194 | 0.2% |
| KuşakveYol BeltandRoad | 1107 | 0.2% |
| BeltAndRoad | 1074 | 0.2% |
| OneBeltOneRoad | 886 | 0.2% |
| NewSilkRoad | 667 | 0.1% |
| Other values (93361) | 161114 | |
| (Missing) | 311862 |
Length
| Value | Count | Frequency (%) |
| beltandroad | 84501 | 14.1% |
| china | 53618 | 8.9% |
| bri | 18497 | 3.1% |
| obor | 16567 | 2.8% |
| onebeltoneroad | 11846 | 2.0% |
| silkroad | 6758 | 1.1% |
| cpec | 5759 | 1.0% |
| pakistan | 5214 | 0.9% |
| newsilkroad | 5040 | 0.8% |
| beltandroadinitiative | 4397 | 0.7% |
| Other values (50836) | 388464 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 586167 | 10.5% |
| n | 421044 | 7.5% |
| 411813 | 7.4% | |
| e | 396264 | 7.1% |
| i | 364066 | 6.5% |
| o | 322897 | 5.8% |
| t | 280890 | 5.0% |
| d | 276723 | 4.9% |
| l | 241180 | 4.3% |
| r | 212981 | 3.8% |
| Other values (1899) | 2082939 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4084340 | |
| Uppercase Letter | 1016468 | 18.2% |
| Space Separator | 411813 | 7.4% |
| Decimal Number | 44518 | 0.8% |
| Other Letter | 31881 | 0.6% |
| Connector Punctuation | 5795 | 0.1% |
| Nonspacing Mark | 1409 | < 0.1% |
| Modifier Letter | 288 | < 0.1% |
| Spacing Mark | 232 | < 0.1% |
| Other Punctuation | 165 | < 0.1% |
| Other values (3) | 55 | < 0.1% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| ا | 2215 | 6.9% |
| 一 | 2022 | 6.3% |
| 路 | 1031 | 3.2% |
| ر | 901 | 2.8% |
| ل | 871 | 2.7% |
| ي | 764 | 2.4% |
| و | 705 | 2.2% |
| ن | 664 | 2.1% |
| س | 611 | 1.9% |
| ت | 594 | 1.9% |
| Other values (1527) | 21503 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 586167 | |
| n | 421044 | |
| e | 396264 | |
| i | 364066 | |
| o | 322897 | |
| t | 280890 | 6.9% |
| d | 276723 | 6.8% |
| l | 241180 | 5.9% |
| r | 212981 | 5.2% |
| s | 176505 | 4.3% |
| Other values (152) | 805623 |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 159416 | |
| R | 157365 | |
| C | 121061 | |
| O | 84752 | 8.3% |
| I | 56868 | 5.6% |
| A | 55246 | 5.4% |
| S | 49985 | 4.9% |
| T | 40873 | 4.0% |
| P | 40526 | 4.0% |
| E | 35502 | 3.5% |
| Other values (88) | 214874 |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ี | 209 | |
| ่ | 191 | |
| ิ | 171 | |
| ้ | 124 | |
| ั | 122 | |
| ် | 91 | 6.5% |
| ุ | 67 | 4.8% |
| ု | 60 | 4.3% |
| ์ | 53 | 3.8% |
| ู | 30 | 2.1% |
| Other values (43) | 291 |
Spacing Mark
| Value | Count | Frequency (%) |
| ा | 43 | |
| ी | 27 | |
| း | 26 | |
| ာ | 22 | |
| ो | 22 | |
| ि | 21 | |
| ေ | 17 | 7.3% |
| ြ | 16 | 6.9% |
| ா | 5 | 2.2% |
| ி | 5 | 2.2% |
| Other values (20) | 28 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 10888 | |
| 2 | 8727 | |
| 0 | 8280 | |
| 9 | 6352 | |
| 7 | 2395 | 5.4% |
| 5 | 2147 | 4.8% |
| 3 | 1990 | 4.5% |
| 8 | 1763 | 4.0% |
| 4 | 1123 | 2.5% |
| 6 | 842 | 1.9% |
| Other values (7) | 11 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 106 | |
| , | 30 | 18.2% |
| ・ | 19 | 11.5% |
| \ | 7 | 4.2% |
| · | 3 | 1.8% |
Modifier Letter
| Value | Count | Frequency (%) |
| ー | 287 | |
| 々 | 1 | 0.3% |
Space Separator
| Value | Count | Frequency (%) |
| 411813 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 5795 |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 23 |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 23 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 9 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5093212 | |
| Common | 462686 | 8.3% |
| Arabic | 12112 | 0.2% |
| Han | 11830 | 0.2% |
| Cyrillic | 7027 | 0.1% |
| Thai | 4913 | 0.1% |
| Katakana | 2522 | < 0.1% |
| Myanmar | 691 | < 0.1% |
| Hangul | 543 | < 0.1% |
| Greek | 486 | < 0.1% |
| Other values (14) | 942 | < 0.1% |
Most frequent character per script
Han
| Value | Count | Frequency (%) |
| 一 | 2022 | 17.1% |
| 路 | 1031 | 8.7% |
| 带 | 585 | 4.9% |
| 中 | 472 | 4.0% |
| 国 | 376 | 3.2% |
| 帯 | 276 | 2.3% |
| 新 | 222 | 1.9% |
| 旅 | 150 | 1.3% |
| 游 | 144 | 1.2% |
| 译 | 143 | 1.2% |
| Other values (1049) | 6409 |
Latin
| Value | Count | Frequency (%) |
| a | 586167 | 11.5% |
| n | 421044 | 8.3% |
| e | 396264 | 7.8% |
| i | 364066 | 7.1% |
| o | 322897 | 6.3% |
| t | 280890 | 5.5% |
| d | 276723 | 5.4% |
| l | 241180 | 4.7% |
| r | 212981 | 4.2% |
| s | 176505 | 3.5% |
| Other values (112) | 1814495 |
Hangul
| Value | Count | Frequency (%) |
| 티 | 40 | 7.4% |
| 켓 | 30 | 5.5% |
| 파 | 28 | 5.2% |
| 인 | 24 | 4.4% |
| 공 | 21 | 3.9% |
| 크 | 19 | 3.5% |
| 터 | 19 | 3.5% |
| 국 | 19 | 3.5% |
| 지 | 18 | 3.3% |
| 중 | 17 | 3.1% |
| Other values (94) | 308 |
Katakana
| Value | Count | Frequency (%) |
| ス | 275 | 10.9% |
| ア | 166 | 6.6% |
| イ | 165 | 6.5% |
| ル | 162 | 6.4% |
| ロ | 142 | 5.6% |
| ウ | 137 | 5.4% |
| ナ | 123 | 4.9% |
| コ | 123 | 4.9% |
| ン | 111 | 4.4% |
| ト | 78 | 3.1% |
| Other values (65) | 1040 |
Cyrillic
| Value | Count | Frequency (%) |
| и | 816 | 11.6% |
| а | 646 | 9.2% |
| н | 530 | 7.5% |
| е | 392 | 5.6% |
| с | 381 | 5.4% |
| т | 364 | 5.2% |
| р | 345 | 4.9% |
| ь | 338 | 4.8% |
| к | 322 | 4.6% |
| л | 302 | 4.3% |
| Other values (52) | 2591 |
Thai
| Value | Count | Frequency (%) |
| า | 349 | 7.1% |
| น | 333 | 6.8% |
| ร | 277 | 5.6% |
| เ | 231 | 4.7% |
| ี | 209 | 4.3% |
| ก | 202 | 4.1% |
| อ | 193 | 3.9% |
| ่ | 191 | 3.9% |
| ง | 179 | 3.6% |
| ิ | 171 | 3.5% |
| Other values (50) | 2578 |
Arabic
| Value | Count | Frequency (%) |
| ا | 2215 | |
| ر | 901 | 7.4% |
| ل | 871 | 7.2% |
| ي | 764 | 6.3% |
| و | 705 | 5.8% |
| ن | 664 | 5.5% |
| س | 611 | 5.0% |
| ت | 594 | 4.9% |
| م | 480 | 4.0% |
| ی | 393 | 3.2% |
| Other values (48) | 3914 |
Hiragana
| Value | Count | Frequency (%) |
| の | 25 | 11.5% |
| い | 10 | 4.6% |
| た | 9 | 4.1% |
| し | 9 | 4.1% |
| と | 9 | 4.1% |
| り | 8 | 3.7% |
| に | 8 | 3.7% |
| は | 8 | 3.7% |
| で | 8 | 3.7% |
| う | 8 | 3.7% |
| Other values (39) | 116 |
Greek
| Value | Count | Frequency (%) |
| α | 44 | 9.1% |
| ν | 35 | 7.2% |
| ο | 31 | 6.4% |
| ι | 25 | 5.1% |
| η | 23 | 4.7% |
| τ | 22 | 4.5% |
| ρ | 21 | 4.3% |
| ε | 20 | 4.1% |
| κ | 19 | 3.9% |
| Ε | 18 | 3.7% |
| Other values (38) | 228 |
Devanagari
| Value | Count | Frequency (%) |
| ा | 43 | 9.3% |
| न | 32 | 6.9% |
| र | 28 | 6.0% |
| ी | 27 | 5.8% |
| ो | 22 | 4.7% |
| ि | 21 | 4.5% |
| ् | 19 | 4.1% |
| त | 18 | 3.9% |
| क | 18 | 3.9% |
| े | 17 | 3.7% |
| Other values (37) | 219 |
Myanmar
| Value | Count | Frequency (%) |
| ် | 91 | 13.2% |
| ု | 60 | 8.7% |
| င | 48 | 6.9% |
| န | 46 | 6.7% |
| တ | 36 | 5.2% |
| ိ | 28 | 4.1% |
| ရ | 26 | 3.8% |
| က | 26 | 3.8% |
| း | 26 | 3.8% |
| ပ | 25 | 3.6% |
| Other values (36) | 279 |
Common
| Value | Count | Frequency (%) |
| 411813 | ||
| 1 | 10888 | 2.4% |
| 2 | 8727 | 1.9% |
| 0 | 8280 | 1.8% |
| 9 | 6352 | 1.4% |
| _ | 5795 | 1.3% |
| 7 | 2395 | 0.5% |
| 5 | 2147 | 0.5% |
| 3 | 1990 | 0.4% |
| 8 | 1763 | 0.4% |
| Other values (26) | 2536 | 0.5% |
Sinhala
| Value | Count | Frequency (%) |
| ි | 4 | 10.3% |
| න | 3 | 7.7% |
| ල | 3 | 7.7% |
| ය | 3 | 7.7% |
| ් | 3 | 7.7% |
| ක | 2 | 5.1% |
| ත | 2 | 5.1% |
| ස | 2 | 5.1% |
| බ | 1 | 2.6% |
| ජ | 1 | 2.6% |
| Other values (15) | 15 |
Tamil
| Value | Count | Frequency (%) |
| ா | 5 | 8.1% |
| ி | 5 | 8.1% |
| ் | 5 | 8.1% |
| ச | 5 | 8.1% |
| த | 4 | 6.5% |
| ர | 4 | 6.5% |
| ந | 3 | 4.8% |
| ள | 3 | 4.8% |
| க | 3 | 4.8% |
| ப | 3 | 4.8% |
| Other values (13) | 22 |
Kannada
| Value | Count | Frequency (%) |
| ್ | 6 | |
| ಾ | 4 | 12.1% |
| ರ | 3 | 9.1% |
| ನ | 3 | 9.1% |
| ಒ | 2 | 6.1% |
| ಭ | 1 | 3.0% |
| ೀ | 1 | 3.0% |
| ಚ | 1 | 3.0% |
| ಡ | 1 | 3.0% |
| ೋ | 1 | 3.0% |
| Other values (10) | 10 |
Hebrew
| Value | Count | Frequency (%) |
| י | 12 | |
| ס | 6 | |
| ן | 6 | |
| א | 5 | |
| ה | 4 | 8.7% |
| ר | 3 | 6.5% |
| פ | 2 | 4.3% |
| ו | 2 | 4.3% |
| ב | 1 | 2.2% |
| ק | 1 | 2.2% |
| Other values (4) | 4 | 8.7% |
Lao
| Value | Count | Frequency (%) |
| າ | 3 | |
| ຖ | 1 | 6.2% |
| ້ | 1 | 6.2% |
| ກ | 1 | 6.2% |
| ນ | 1 | 6.2% |
| ເ | 1 | 6.2% |
| ມ | 1 | 6.2% |
| ື | 1 | 6.2% |
| ອ | 1 | 6.2% |
| ງ | 1 | 6.2% |
| Other values (4) | 4 |
Bengali
| Value | Count | Frequency (%) |
| ন | 3 | |
| ত | 2 | |
| ্ | 2 | |
| গ | 1 | 5.9% |
| ণ | 1 | 5.9% |
| ৰ | 1 | 5.9% |
| ী | 1 | 5.9% |
| ক | 1 | 5.9% |
| ে | 1 | 5.9% |
| প | 1 | 5.9% |
| Other values (3) | 3 |
Georgian
| Value | Count | Frequency (%) |
| ო | 3 | |
| ნ | 2 | |
| ი | 2 | |
| რ | 2 | |
| ა | 2 | |
| დ | 1 | 6.2% |
| ფ | 1 | 6.2% |
| პ | 1 | 6.2% |
| ს | 1 | 6.2% |
| ტ | 1 | 6.2% |
Ethiopic
| Value | Count | Frequency (%) |
| ይ | 1 | |
| ቻ | 1 | |
| ና | 1 | |
| ኢ | 1 | |
| ት | 1 | |
| ዮ | 1 | |
| ጵ | 1 | |
| ያ | 1 |
Oriya
| Value | Count | Frequency (%) |
| ୁ | 2 | |
| ଶ | 1 | |
| ଆ | 1 | |
| ନ | 1 | |
| ବ | 1 | |
| ା | 1 | |
| ଦ | 1 |
Inherited
| Value | Count | Frequency (%) |
| َ | 4 | |
| ِ | 2 | |
| ️ | 1 | 11.1% |
| ُ | 1 | 11.1% |
| ٍ | 1 | 11.1% |
Armenian
| Value | Count | Frequency (%) |
| Ե | 1 | |
| Պ | 1 | |
| Հ | 1 |
Bopomofo
| Value | Count | Frequency (%) |
| ㄧ | 3 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5548321 | |
| Arabic | 12120 | 0.2% |
| CJK | 11828 | 0.2% |
| None | 7694 | 0.1% |
| Cyrillic | 7027 | 0.1% |
| Thai | 4913 | 0.1% |
| Katakana | 2822 | 0.1% |
| Myanmar | 691 | < 0.1% |
| Hangul | 543 | < 0.1% |
| Devanagari | 464 | < 0.1% |
| Other values (18) | 541 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 586167 | 10.6% |
| n | 421044 | 7.6% |
| 411813 | 7.4% | |
| e | 396264 | 7.1% |
| i | 364066 | 6.6% |
| o | 322897 | 5.8% |
| t | 280890 | 5.1% |
| d | 276723 | 5.0% |
| l | 241180 | 4.3% |
| r | 212981 | 3.8% |
| Other values (60) | 2034296 |
None
| Value | Count | Frequency (%) |
| ş | 2529 | |
| Ç | 1061 | |
| ü | 739 | 9.6% |
| ß | 360 | 4.7% |
| ç | 279 | 3.6% |
| ž | 252 | 3.3% |
| ó | 235 | 3.1% |
| ö | 203 | 2.6% |
| é | 168 | 2.2% |
| ł | 163 | 2.1% |
| Other values (113) | 1705 |
Arabic
| Value | Count | Frequency (%) |
| ا | 2215 | |
| ر | 901 | 7.4% |
| ل | 871 | 7.2% |
| ي | 764 | 6.3% |
| و | 705 | 5.8% |
| ن | 664 | 5.5% |
| س | 611 | 5.0% |
| ت | 594 | 4.9% |
| م | 480 | 4.0% |
| ی | 393 | 3.2% |
| Other values (52) | 3922 |
CJK
| Value | Count | Frequency (%) |
| 一 | 2022 | 17.1% |
| 路 | 1031 | 8.7% |
| 带 | 585 | 4.9% |
| 中 | 472 | 4.0% |
| 国 | 376 | 3.2% |
| 帯 | 276 | 2.3% |
| 新 | 222 | 1.9% |
| 旅 | 150 | 1.3% |
| 游 | 144 | 1.2% |
| 译 | 143 | 1.2% |
| Other values (1047) | 6407 |
Cyrillic
| Value | Count | Frequency (%) |
| и | 816 | 11.6% |
| а | 646 | 9.2% |
| н | 530 | 7.5% |
| е | 392 | 5.6% |
| с | 381 | 5.4% |
| т | 364 | 5.2% |
| р | 345 | 4.9% |
| ь | 338 | 4.8% |
| к | 322 | 4.6% |
| л | 302 | 4.3% |
| Other values (52) | 2591 |
Thai
| Value | Count | Frequency (%) |
| า | 349 | 7.1% |
| น | 333 | 6.8% |
| ร | 277 | 5.6% |
| เ | 231 | 4.7% |
| ี | 209 | 4.3% |
| ก | 202 | 4.1% |
| อ | 193 | 3.9% |
| ่ | 191 | 3.9% |
| ง | 179 | 3.6% |
| ิ | 171 | 3.5% |
| Other values (50) | 2578 |
Katakana
| Value | Count | Frequency (%) |
| ー | 287 | 10.2% |
| ス | 275 | 9.7% |
| ア | 166 | 5.9% |
| イ | 165 | 5.8% |
| ル | 162 | 5.7% |
| ロ | 142 | 5.0% |
| ウ | 137 | 4.9% |
| ナ | 123 | 4.4% |
| コ | 123 | 4.4% |
| ン | 111 | 3.9% |
| Other values (61) | 1131 |
Myanmar
| Value | Count | Frequency (%) |
| ် | 91 | 13.2% |
| ု | 60 | 8.7% |
| င | 48 | 6.9% |
| န | 46 | 6.7% |
| တ | 36 | 5.2% |
| ိ | 28 | 4.1% |
| ရ | 26 | 3.8% |
| က | 26 | 3.8% |
| း | 26 | 3.8% |
| ပ | 25 | 3.6% |
| Other values (36) | 279 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ℑ | 45 |
Devanagari
| Value | Count | Frequency (%) |
| ा | 43 | 9.3% |
| न | 32 | 6.9% |
| र | 28 | 6.0% |
| ी | 27 | 5.8% |
| ो | 22 | 4.7% |
| ि | 21 | 4.5% |
| ् | 19 | 4.1% |
| त | 18 | 3.9% |
| क | 18 | 3.9% |
| े | 17 | 3.7% |
| Other values (37) | 219 |
Hangul
| Value | Count | Frequency (%) |
| 티 | 40 | 7.4% |
| 켓 | 30 | 5.5% |
| 파 | 28 | 5.2% |
| 인 | 24 | 4.4% |
| 공 | 21 | 3.9% |
| 크 | 19 | 3.5% |
| 터 | 19 | 3.5% |
| 국 | 19 | 3.5% |
| 지 | 18 | 3.3% |
| 중 | 17 | 3.1% |
| Other values (94) | 308 |
Hiragana
| Value | Count | Frequency (%) |
| の | 25 | 11.5% |
| い | 10 | 4.6% |
| た | 9 | 4.1% |
| し | 9 | 4.1% |
| と | 9 | 4.1% |
| り | 8 | 3.7% |
| に | 8 | 3.7% |
| は | 8 | 3.7% |
| で | 8 | 3.7% |
| う | 8 | 3.7% |
| Other values (39) | 116 |
Hebrew
| Value | Count | Frequency (%) |
| י | 12 | |
| ס | 6 | |
| ן | 6 | |
| א | 5 | |
| ה | 4 | 8.7% |
| ר | 3 | 6.5% |
| פ | 2 | 4.3% |
| ו | 2 | 4.3% |
| ב | 1 | 2.2% |
| ק | 1 | 2.2% |
| Other values (4) | 4 | 8.7% |
Kannada
| Value | Count | Frequency (%) |
| ್ | 6 | |
| ಾ | 4 | 12.1% |
| ರ | 3 | 9.1% |
| ನ | 3 | 9.1% |
| ಒ | 2 | 6.1% |
| ಭ | 1 | 3.0% |
| ೀ | 1 | 3.0% |
| ಚ | 1 | 3.0% |
| ಡ | 1 | 3.0% |
| ೋ | 1 | 3.0% |
| Other values (10) | 10 |
Tamil
| Value | Count | Frequency (%) |
| ா | 5 | 8.1% |
| ி | 5 | 8.1% |
| ் | 5 | 8.1% |
| ச | 5 | 8.1% |
| த | 4 | 6.5% |
| ர | 4 | 6.5% |
| ந | 3 | 4.8% |
| ள | 3 | 4.8% |
| க | 3 | 4.8% |
| ப | 3 | 4.8% |
| Other values (13) | 22 |
Math Alphanum
| Value | Count | Frequency (%) |
| 𝗶 | 4 | |
| 𝗻 | 3 | |
| 𝗫 | 1 | 5.3% |
| 𝗝 | 1 | 5.3% |
| 𝗽 | 1 | 5.3% |
| 𝗴 | 1 | 5.3% |
| 𝗣 | 1 | 5.3% |
| 𝘂 | 1 | 5.3% |
| 𝘁 | 1 | 5.3% |
| 𝒪 | 1 | 5.3% |
| Other values (4) | 4 |
Sinhala
| Value | Count | Frequency (%) |
| ි | 4 | 10.3% |
| න | 3 | 7.7% |
| ල | 3 | 7.7% |
| ය | 3 | 7.7% |
| ් | 3 | 7.7% |
| ක | 2 | 5.1% |
| ත | 2 | 5.1% |
| ස | 2 | 5.1% |
| බ | 1 | 2.6% |
| ජ | 1 | 2.6% |
| Other values (15) | 15 |
IPA Ext
| Value | Count | Frequency (%) |
| ə | 4 |
Lao
| Value | Count | Frequency (%) |
| າ | 3 | |
| ຖ | 1 | 6.2% |
| ້ | 1 | 6.2% |
| ກ | 1 | 6.2% |
| ນ | 1 | 6.2% |
| ເ | 1 | 6.2% |
| ມ | 1 | 6.2% |
| ື | 1 | 6.2% |
| ອ | 1 | 6.2% |
| ງ | 1 | 6.2% |
| Other values (4) | 4 |
Bopomofo
| Value | Count | Frequency (%) |
| ㄧ | 3 |
Bengali
| Value | Count | Frequency (%) |
| ন | 3 | |
| ত | 2 | |
| ্ | 2 | |
| গ | 1 | 5.9% |
| ণ | 1 | 5.9% |
| ৰ | 1 | 5.9% |
| ী | 1 | 5.9% |
| ক | 1 | 5.9% |
| ে | 1 | 5.9% |
| প | 1 | 5.9% |
| Other values (3) | 3 |
Georgian
| Value | Count | Frequency (%) |
| ო | 3 | |
| ნ | 2 | |
| ი | 2 | |
| რ | 2 | |
| ა | 2 | |
| დ | 1 | 6.2% |
| ფ | 1 | 6.2% |
| პ | 1 | 6.2% |
| ს | 1 | 6.2% |
| ტ | 1 | 6.2% |
Oriya
| Value | Count | Frequency (%) |
| ୁ | 2 | |
| ଶ | 1 | |
| ଆ | 1 | |
| ନ | 1 | |
| ବ | 1 | |
| ା | 1 | |
| ଦ | 1 |
Latin Ext Additional
| Value | Count | Frequency (%) |
| Ế | 1 | |
| Ự | 1 |
Ethiopic
| Value | Count | Frequency (%) |
| ይ | 1 | |
| ቻ | 1 | |
| ና | 1 | |
| ኢ | 1 | |
| ት | 1 | |
| ዮ | 1 | |
| ጵ | 1 | |
| ያ | 1 |
VS
| Value | Count | Frequency (%) |
| ️ | 1 |
Armenian
| Value | Count | Frequency (%) |
| Ե | 1 | |
| Պ | 1 | |
| Հ | 1 |
CJK Ext A
| Value | Count | Frequency (%) |
| 㬵 | 1 |
hashtag_lang
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 2344 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 311885 |
| Missing (%) | 62.3% |
| Memory size | 20.7 MiB |
| en 71 | |
|---|---|
| en 50 | 6022 |
| en 66 | 4466 |
| en 58 | 4344 |
| en 55 | 4285 |
| Other values (2339) |
Length
| Max length | 6 |
|---|---|
| Median length | 5 |
| Mean length | 5.0013558 |
| Min length | 4 |
Characters and Unicode
| Total characters | 944381 |
|---|---|
| Distinct characters | 36 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 746 ? |
|---|---|
| Unique (%) | 0.4% |
Sample
| 1st row | en 50 |
|---|---|
| 2nd row | en 48 |
| 3rd row | en 41 |
| 4th row | en 50 |
| 5th row | en 18 |
Common Values
| Value | Count | Frequency (%) |
| en 71 | 17489 | 3.5% |
| en 50 | 6022 | 1.2% |
| en 66 | 4466 | 0.9% |
| en 58 | 4344 | 0.9% |
| en 55 | 4285 | 0.9% |
| en 69 | 4124 | 0.8% |
| en 53 | 3553 | 0.7% |
| en 56 | 3529 | 0.7% |
| en 60 | 3386 | 0.7% |
| en 54 | 3259 | 0.7% |
| Other values (2334) | 134368 | |
| (Missing) | 311885 |
Length
| Value | Count | Frequency (%) |
| en | 163936 | |
| 71 | 17601 | 4.7% |
| 50 | 6221 | 1.6% |
| 66 | 5774 | 1.5% |
| id | 5025 | 1.3% |
| 55 | 4518 | 1.2% |
| 58 | 4483 | 1.2% |
| 69 | 4294 | 1.1% |
| de | 3903 | 1.0% |
| 53 | 3824 | 1.0% |
| Other values (197) | 158071 |
Most occurring characters
| Value | Count | Frequency (%) |
| 188825 | ||
| e | 170314 | |
| n | 164622 | |
| 5 | 55725 | 5.9% |
| 6 | 54764 | 5.8% |
| 7 | 51740 | 5.5% |
| 4 | 40144 | 4.3% |
| 1 | 37593 | 4.0% |
| 3 | 37413 | 4.0% |
| 2 | 30096 | 3.2% |
| Other values (26) | 113145 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 377961 | |
| Decimal Number | 377595 | |
| Space Separator | 188825 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 170314 | |
| n | 164622 | |
| d | 9001 | 2.4% |
| i | 7896 | 2.1% |
| t | 4621 | 1.2% |
| r | 3884 | 1.0% |
| s | 3484 | 0.9% |
| a | 2182 | 0.6% |
| f | 1987 | 0.5% |
| h | 1558 | 0.4% |
| Other values (15) | 8412 | 2.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 55725 | |
| 6 | 54764 | |
| 7 | 51740 | |
| 4 | 40144 | |
| 1 | 37593 | |
| 3 | 37413 | |
| 2 | 30096 | |
| 8 | 27985 | |
| 9 | 22037 | 5.8% |
| 0 | 20098 | 5.3% |
Space Separator
| Value | Count | Frequency (%) |
| 188825 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 566420 | |
| Latin | 377961 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 170314 | |
| n | 164622 | |
| d | 9001 | 2.4% |
| i | 7896 | 2.1% |
| t | 4621 | 1.2% |
| r | 3884 | 1.0% |
| s | 3484 | 0.9% |
| a | 2182 | 0.6% |
| f | 1987 | 0.5% |
| h | 1558 | 0.4% |
| Other values (15) | 8412 | 2.2% |
Common
| Value | Count | Frequency (%) |
| 188825 | ||
| 5 | 55725 | 9.8% |
| 6 | 54764 | 9.7% |
| 7 | 51740 | 9.1% |
| 4 | 40144 | 7.1% |
| 1 | 37593 | 6.6% |
| 3 | 37413 | 6.6% |
| 2 | 30096 | 5.3% |
| 8 | 27985 | 4.9% |
| 9 | 22037 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 944381 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 188825 | ||
| e | 170314 | |
| n | 164622 | |
| 5 | 55725 | 5.9% |
| 6 | 54764 | 5.8% |
| 7 | 51740 | 5.5% |
| 4 | 40144 | 4.3% |
| 1 | 37593 | 4.0% |
| 3 | 37413 | 4.0% |
| 2 | 30096 | 3.2% |
| Other values (26) | 113145 |
hashtag_en
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 92987 |
|---|---|
| Distinct (%) | 49.2% |
| Missing | 311885 |
| Missing (%) | 62.3% |
| Memory size | 25.4 MiB |
| BeltandRoad | 14956 |
|---|---|
| China | 3778 |
| beltandroad | 1538 |
| China BeltandRoad | 1402 |
| OBOR | 1201 |
| Other values (92982) |
Length
| Max length | 5099 |
|---|---|
| Median length | 594 |
| Mean length | 30.050322 |
| Min length | 1 |
Characters and Unicode
| Total characters | 5674252 |
|---|---|
| Distinct characters | 890 |
| Distinct categories | 18 ? |
| Distinct scripts | 21 ? |
| Distinct blocks | 26 ? |
Unique
| Unique | 80085 ? |
|---|---|
| Unique (%) | 42.4% |
Sample
| 1st row | China |
|---|---|
| 2nd row | Asia energy NewSilkRoad |
| 3rd row | china asia energy |
| 4th row | China |
| 5th row | Mongolia |
Common Values
| Value | Count | Frequency (%) |
| BeltandRoad | 14956 | 3.0% |
| China | 3778 | 0.8% |
| beltandroad | 1538 | 0.3% |
| China BeltandRoad | 1402 | 0.3% |
| OBOR | 1201 | 0.2% |
| BRI | 1195 | 0.2% |
| KuşakveYol BeltandRoad | 1107 | 0.2% |
| BeltAndRoad | 1082 | 0.2% |
| OneBeltOneRoad | 917 | 0.2% |
| NewSilkRoad | 691 | 0.1% |
| Other values (92977) | 160958 | |
| (Missing) | 311885 |
Length
| Value | Count | Frequency (%) |
| beltandroad | 82719 | 13.2% |
| china | 55986 | 8.9% |
| bri | 18445 | 2.9% |
| obor | 17002 | 2.7% |
| onebeltoneroad | 11640 | 1.9% |
| silkroad | 6748 | 1.1% |
| cpec | 5723 | 0.9% |
| pakistan | 5216 | 0.8% |
| hongkong | 5213 | 0.8% |
| newsilkroad | 4983 | 0.8% |
| Other values (50601) | 414901 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 591458 | 10.4% |
| 439791 | 7.8% | |
| n | 431498 | 7.6% |
| e | 398004 | 7.0% |
| i | 365768 | 6.4% |
| o | 332025 | 5.9% |
| t | 288126 | 5.1% |
| d | 278753 | 4.9% |
| l | 242522 | 4.3% |
| r | 213832 | 3.8% |
| Other values (880) | 2092475 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4135385 | |
| Uppercase Letter | 1030503 | 18.2% |
| Space Separator | 439791 | 7.8% |
| Decimal Number | 44723 | 0.8% |
| Other Letter | 8341 | 0.1% |
| Other Punctuation | 5808 | 0.1% |
| Connector Punctuation | 5378 | 0.1% |
| Dash Punctuation | 3032 | 0.1% |
| Nonspacing Mark | 378 | < 0.1% |
| Other Symbol | 321 | < 0.1% |
| Other values (8) | 592 | < 0.1% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 一 | 930 | 11.1% |
| ا | 692 | 8.3% |
| 路 | 467 | 5.6% |
| 带 | 429 | 5.1% |
| ر | 351 | 4.2% |
| ی | 337 | 4.0% |
| و | 282 | 3.4% |
| ن | 258 | 3.1% |
| ت | 229 | 2.7% |
| ک | 208 | 2.5% |
| Other values (547) | 4158 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 591458 | |
| n | 431498 | |
| e | 398004 | |
| i | 365768 | |
| o | 332025 | |
| t | 288126 | 7.0% |
| d | 278753 | 6.7% |
| l | 242522 | 5.9% |
| r | 213832 | 5.2% |
| s | 179782 | 4.3% |
| Other values (127) | 813617 |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 160055 | |
| R | 159624 | |
| C | 124697 | |
| O | 83919 | 8.1% |
| I | 58554 | 5.7% |
| A | 56795 | 5.5% |
| S | 50779 | 4.9% |
| T | 42527 | 4.1% |
| P | 40083 | 3.9% |
| E | 36762 | 3.6% |
| Other values (68) | 216708 |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ် | 91 | |
| ု | 60 | |
| ိ | 28 | 7.4% |
| ွ | 20 | 5.3% |
| ှ | 18 | 4.8% |
| ံ | 18 | 4.8% |
| ္ | 15 | 4.0% |
| ่ | 12 | 3.2% |
| ့ | 10 | 2.6% |
| ั | 9 | 2.4% |
| Other values (31) | 97 |
Spacing Mark
| Value | Count | Frequency (%) |
| း | 26 | |
| ာ | 22 | |
| ေ | 17 | |
| ြ | 16 | |
| ा | 8 | 6.0% |
| ி | 5 | 3.7% |
| ா | 5 | 3.7% |
| ी | 4 | 3.0% |
| ಾ | 4 | 3.0% |
| ि | 3 | 2.2% |
| Other values (17) | 24 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 2733 | |
| . | 1834 | |
| ' | 1010 | 17.4% |
| \ | 57 | 1.0% |
| ? | 36 | 0.6% |
| " | 28 | 0.5% |
| : | 24 | 0.4% |
| @ | 23 | 0.4% |
| · | 20 | 0.3% |
| ! | 12 | 0.2% |
| Other values (6) | 31 | 0.5% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 11191 | |
| 2 | 8658 | |
| 0 | 8285 | |
| 9 | 6314 | |
| 7 | 2418 | 5.4% |
| 5 | 2135 | 4.8% |
| 3 | 1978 | 4.4% |
| 8 | 1760 | 3.9% |
| 4 | 1134 | 2.5% |
| 6 | 841 | 1.9% |
| Other values (5) | 9 | < 0.1% |
Other Symbol
| Value | Count | Frequency (%) |
| ♪ | 312 | |
| ▪ | 8 | 2.5% |
| ▁ | 1 | 0.3% |
Math Symbol
| Value | Count | Frequency (%) |
| = | 90 | |
| < | 1 | 1.1% |
| > | 1 | 1.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 55 | |
| } | 50 | |
| ] | 7 | 6.2% |
Open Punctuation
| Value | Count | Frequency (%) |
| { | 50 | |
| ( | 24 | |
| [ | 5 | 6.3% |
Space Separator
| Value | Count | Frequency (%) |
| 439791 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 5378 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 3032 |
Modifier Letter
| Value | Count | Frequency (%) |
| ー | 158 |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 8 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 6 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ^ | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5165164 | |
| Common | 499566 | 8.8% |
| Arabic | 4001 | 0.1% |
| Han | 2968 | 0.1% |
| Myanmar | 691 | < 0.1% |
| Katakana | 664 | < 0.1% |
| Greek | 479 | < 0.1% |
| Thai | 212 | < 0.1% |
| Cyrillic | 181 | < 0.1% |
| Devanagari | 84 | < 0.1% |
| Other values (11) | 242 | < 0.1% |
Most frequent character per script
Han
| Value | Count | Frequency (%) |
| 一 | 930 | |
| 路 | 467 | |
| 带 | 429 | |
| 中 | 62 | 2.1% |
| 易 | 52 | 1.8% |
| 国 | 51 | 1.7% |
| 資 | 51 | 1.7% |
| 投 | 51 | 1.7% |
| 経 | 50 | 1.7% |
| 済 | 50 | 1.7% |
| Other values (298) | 775 |
Latin
| Value | Count | Frequency (%) |
| a | 591458 | 11.5% |
| n | 431498 | 8.4% |
| e | 398004 | 7.7% |
| i | 365768 | 7.1% |
| o | 332025 | 6.4% |
| t | 288126 | 5.6% |
| d | 278753 | 5.4% |
| l | 242522 | 4.7% |
| r | 213832 | 4.1% |
| s | 179782 | 3.5% |
| Other values (103) | 1843396 |
Common
| Value | Count | Frequency (%) |
| 439791 | ||
| 1 | 11191 | 2.2% |
| 2 | 8658 | 1.7% |
| 0 | 8285 | 1.7% |
| 9 | 6314 | 1.3% |
| _ | 5378 | 1.1% |
| - | 3032 | 0.6% |
| , | 2733 | 0.5% |
| 7 | 2418 | 0.5% |
| 5 | 2135 | 0.4% |
| Other values (50) | 9631 | 1.9% |
Arabic
| Value | Count | Frequency (%) |
| ا | 692 | |
| ر | 351 | 8.8% |
| ی | 337 | 8.4% |
| و | 282 | 7.0% |
| ن | 258 | 6.4% |
| ت | 229 | 5.7% |
| ک | 208 | 5.2% |
| م | 175 | 4.4% |
| س | 160 | 4.0% |
| د | 158 | 3.9% |
| Other values (43) | 1151 |
Greek
| Value | Count | Frequency (%) |
| α | 44 | 9.2% |
| ν | 35 | 7.3% |
| ο | 31 | 6.5% |
| ι | 25 | 5.2% |
| η | 23 | 4.8% |
| τ | 22 | 4.6% |
| ρ | 21 | 4.4% |
| ε | 20 | 4.2% |
| κ | 19 | 4.0% |
| ί | 18 | 3.8% |
| Other values (38) | 221 |
Myanmar
| Value | Count | Frequency (%) |
| ် | 91 | 13.2% |
| ု | 60 | 8.7% |
| င | 48 | 6.9% |
| န | 46 | 6.7% |
| တ | 36 | 5.2% |
| ိ | 28 | 4.1% |
| း | 26 | 3.8% |
| က | 26 | 3.8% |
| ရ | 26 | 3.8% |
| ပ | 25 | 3.6% |
| Other values (36) | 279 |
Thai
| Value | Count | Frequency (%) |
| ร | 14 | 6.6% |
| ว | 14 | 6.6% |
| า | 13 | 6.1% |
| ่ | 12 | 5.7% |
| ก | 10 | 4.7% |
| ข | 10 | 4.7% |
| ั | 9 | 4.2% |
| ง | 8 | 3.8% |
| ท | 8 | 3.8% |
| ิ | 7 | 3.3% |
| Other values (36) | 107 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 22 | 12.2% |
| м | 13 | 7.2% |
| и | 12 | 6.6% |
| н | 12 | 6.6% |
| ш | 8 | 4.4% |
| р | 8 | 4.4% |
| с | 7 | 3.9% |
| т | 7 | 3.9% |
| е | 7 | 3.9% |
| ы | 7 | 3.9% |
| Other values (29) | 78 |
Devanagari
| Value | Count | Frequency (%) |
| ा | 8 | 9.5% |
| न | 7 | 8.3% |
| व | 6 | 7.1% |
| े | 6 | 7.1% |
| ् | 5 | 6.0% |
| ी | 4 | 4.8% |
| ब | 4 | 4.8% |
| र | 3 | 3.6% |
| क | 3 | 3.6% |
| ि | 3 | 3.6% |
| Other values (21) | 35 |
Sinhala
| Value | Count | Frequency (%) |
| ි | 4 | 10.3% |
| ල | 3 | 7.7% |
| ය | 3 | 7.7% |
| න | 3 | 7.7% |
| ් | 3 | 7.7% |
| ක | 2 | 5.1% |
| ත | 2 | 5.1% |
| ස | 2 | 5.1% |
| ඹ | 1 | 2.6% |
| ට | 1 | 2.6% |
| Other values (15) | 15 |
Katakana
| Value | Count | Frequency (%) |
| ア | 100 | |
| ン | 51 | |
| ス | 51 | |
| ジ | 51 | |
| キ | 50 | |
| パ | 50 | |
| ダ | 50 | |
| ム | 50 | |
| ト | 50 | |
| ド | 50 | |
| Other values (13) | 111 |
Tamil
| Value | Count | Frequency (%) |
| ் | 5 | 8.1% |
| ி | 5 | 8.1% |
| ச | 5 | 8.1% |
| ா | 5 | 8.1% |
| த | 4 | 6.5% |
| ர | 4 | 6.5% |
| ப | 3 | 4.8% |
| ந | 3 | 4.8% |
| ீ | 3 | 4.8% |
| க | 3 | 4.8% |
| Other values (13) | 22 |
Kannada
| Value | Count | Frequency (%) |
| ್ | 6 | |
| ಾ | 4 | 12.1% |
| ರ | 3 | 9.1% |
| ನ | 3 | 9.1% |
| ಒ | 2 | 6.1% |
| ತ | 1 | 3.0% |
| ಿ | 1 | 3.0% |
| ಣ | 1 | 3.0% |
| ವ | 1 | 3.0% |
| ಜ | 1 | 3.0% |
| Other values (10) | 10 |
Hebrew
| Value | Count | Frequency (%) |
| י | 12 | |
| ס | 6 | |
| ן | 6 | |
| א | 5 | |
| ה | 4 | 8.7% |
| ר | 3 | 6.5% |
| ו | 2 | 4.3% |
| פ | 2 | 4.3% |
| ק | 1 | 2.2% |
| ח | 1 | 2.2% |
| Other values (4) | 4 | 8.7% |
Hiragana
| Value | Count | Frequency (%) |
| で | 3 | |
| ぷ | 3 | |
| ろ | 2 | |
| ん | 2 | |
| ぽ | 2 | |
| ぐ | 1 | 6.7% |
| に | 1 | 6.7% |
| ゃ | 1 | 6.7% |
Ethiopic
| Value | Count | Frequency (%) |
| ና | 1 | |
| ይ | 1 | |
| ቻ | 1 | |
| ት | 1 | |
| ዮ | 1 | |
| ጵ | 1 | |
| ያ | 1 | |
| ኢ | 1 |
Oriya
| Value | Count | Frequency (%) |
| ୁ | 2 | |
| ଦ | 1 | |
| ା | 1 | |
| ବ | 1 | |
| ନ | 1 | |
| ଆ | 1 | |
| ଶ | 1 |
Bengali
| Value | Count | Frequency (%) |
| ত | 2 | |
| ্ | 2 | |
| ৰ | 1 | |
| ন | 1 | |
| ণ | 1 | |
| গ | 1 | |
| ক | 1 |
Hangul
| Value | Count | Frequency (%) |
| 평 | 2 | |
| 인 | 2 | |
| 도 | 2 | |
| 태 | 2 | |
| 양 | 2 | |
| 좋 | 1 | |
| 네 | 1 |
Inherited
| Value | Count | Frequency (%) |
| َ | 4 | |
| ِ | 2 | |
| ٍ | 1 | 14.3% |
Bopomofo
| Value | Count | Frequency (%) |
| ㄧ | 3 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5661889 | |
| Arabic | 4008 | 0.1% |
| CJK | 2968 | 0.1% |
| None | 2773 | < 0.1% |
| Katakana | 816 | < 0.1% |
| Myanmar | 691 | < 0.1% |
| Misc Symbols | 312 | < 0.1% |
| Thai | 212 | < 0.1% |
| Cyrillic | 181 | < 0.1% |
| Devanagari | 84 | < 0.1% |
| Other values (16) | 318 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 591458 | 10.4% |
| 439791 | 7.8% | |
| n | 431498 | 7.6% |
| e | 398004 | 7.0% |
| i | 365768 | 6.5% |
| o | 332025 | 5.9% |
| t | 288126 | 5.1% |
| d | 278753 | 4.9% |
| l | 242522 | 4.3% |
| r | 213832 | 3.8% |
| Other values (80) | 2080112 |
None
| Value | Count | Frequency (%) |
| ş | 1299 | |
| ž | 252 | 9.1% |
| ö | 155 | 5.6% |
| ß | 65 | 2.3% |
| í | 50 | 1.8% |
| ü | 50 | 1.8% |
| İ | 45 | 1.6% |
| α | 44 | 1.6% |
| é | 44 | 1.6% |
| ν | 35 | 1.3% |
| Other values (105) | 734 |
CJK
| Value | Count | Frequency (%) |
| 一 | 930 | |
| 路 | 467 | |
| 带 | 429 | |
| 中 | 62 | 2.1% |
| 易 | 52 | 1.8% |
| 国 | 51 | 1.7% |
| 資 | 51 | 1.7% |
| 投 | 51 | 1.7% |
| 経 | 50 | 1.7% |
| 済 | 50 | 1.7% |
| Other values (298) | 775 |
Arabic
| Value | Count | Frequency (%) |
| ا | 692 | |
| ر | 351 | 8.8% |
| ی | 337 | 8.4% |
| و | 282 | 7.0% |
| ن | 258 | 6.4% |
| ت | 229 | 5.7% |
| ک | 208 | 5.2% |
| م | 175 | 4.4% |
| س | 160 | 4.0% |
| د | 158 | 3.9% |
| Other values (46) | 1158 |
Misc Symbols
| Value | Count | Frequency (%) |
| ♪ | 312 |
Katakana
| Value | Count | Frequency (%) |
| ー | 158 | |
| ア | 100 | |
| ン | 51 | 6.2% |
| ス | 51 | 6.2% |
| ジ | 51 | 6.2% |
| キ | 50 | 6.1% |
| パ | 50 | 6.1% |
| ダ | 50 | 6.1% |
| ム | 50 | 6.1% |
| ト | 50 | 6.1% |
| Other values (8) | 155 |
Myanmar
| Value | Count | Frequency (%) |
| ် | 91 | 13.2% |
| ု | 60 | 8.7% |
| င | 48 | 6.9% |
| န | 46 | 6.7% |
| တ | 36 | 5.2% |
| ိ | 28 | 4.1% |
| း | 26 | 3.8% |
| က | 26 | 3.8% |
| ရ | 26 | 3.8% |
| ပ | 25 | 3.6% |
| Other values (36) | 279 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ℑ | 45 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 22 | 12.2% |
| м | 13 | 7.2% |
| и | 12 | 6.6% |
| н | 12 | 6.6% |
| ш | 8 | 4.4% |
| р | 8 | 4.4% |
| с | 7 | 3.9% |
| т | 7 | 3.9% |
| е | 7 | 3.9% |
| ы | 7 | 3.9% |
| Other values (29) | 78 |
Thai
| Value | Count | Frequency (%) |
| ร | 14 | 6.6% |
| ว | 14 | 6.6% |
| า | 13 | 6.1% |
| ่ | 12 | 5.7% |
| ก | 10 | 4.7% |
| ข | 10 | 4.7% |
| ั | 9 | 4.2% |
| ง | 8 | 3.8% |
| ท | 8 | 3.8% |
| ิ | 7 | 3.3% |
| Other values (36) | 107 |
Hebrew
| Value | Count | Frequency (%) |
| י | 12 | |
| ס | 6 | |
| ן | 6 | |
| א | 5 | |
| ה | 4 | 8.7% |
| ר | 3 | 6.5% |
| ו | 2 | 4.3% |
| פ | 2 | 4.3% |
| ק | 1 | 2.2% |
| ח | 1 | 2.2% |
| Other values (4) | 4 | 8.7% |
Devanagari
| Value | Count | Frequency (%) |
| ा | 8 | 9.5% |
| न | 7 | 8.3% |
| व | 6 | 7.1% |
| े | 6 | 7.1% |
| ् | 5 | 6.0% |
| ी | 4 | 4.8% |
| ब | 4 | 4.8% |
| र | 3 | 3.6% |
| क | 3 | 3.6% |
| ि | 3 | 3.6% |
| Other values (21) | 35 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 8 | |
| • | 1 | 11.1% |
Geometric Shapes
| Value | Count | Frequency (%) |
| ▪ | 8 |
Kannada
| Value | Count | Frequency (%) |
| ್ | 6 | |
| ಾ | 4 | 12.1% |
| ರ | 3 | 9.1% |
| ನ | 3 | 9.1% |
| ಒ | 2 | 6.1% |
| ತ | 1 | 3.0% |
| ಿ | 1 | 3.0% |
| ಣ | 1 | 3.0% |
| ವ | 1 | 3.0% |
| ಜ | 1 | 3.0% |
| Other values (10) | 10 |
Tamil
| Value | Count | Frequency (%) |
| ் | 5 | 8.1% |
| ி | 5 | 8.1% |
| ச | 5 | 8.1% |
| ா | 5 | 8.1% |
| த | 4 | 6.5% |
| ர | 4 | 6.5% |
| ப | 3 | 4.8% |
| ந | 3 | 4.8% |
| ீ | 3 | 4.8% |
| க | 3 | 4.8% |
| Other values (13) | 22 |
Sinhala
| Value | Count | Frequency (%) |
| ි | 4 | 10.3% |
| ල | 3 | 7.7% |
| ය | 3 | 7.7% |
| න | 3 | 7.7% |
| ් | 3 | 7.7% |
| ක | 2 | 5.1% |
| ත | 2 | 5.1% |
| ස | 2 | 5.1% |
| ඹ | 1 | 2.6% |
| ට | 1 | 2.6% |
| Other values (15) | 15 |
Math Alphanum
| Value | Count | Frequency (%) |
| 𝗶 | 4 | |
| 𝗻 | 3 | |
| 𝗽 | 1 | 5.3% |
| 𝒪 | 1 | 5.3% |
| 𝒷 | 1 | 5.3% |
| 𝑒 | 1 | 5.3% |
| 𝒾 | 1 | 5.3% |
| 𝒟 | 1 | 5.3% |
| 𝗴 | 1 | 5.3% |
| 𝗝 | 1 | 5.3% |
| Other values (4) | 4 |
Bopomofo
| Value | Count | Frequency (%) |
| ㄧ | 3 |
Hiragana
| Value | Count | Frequency (%) |
| で | 3 | |
| ぷ | 3 | |
| ろ | 2 | |
| ん | 2 | |
| ぽ | 2 | |
| ぐ | 1 | 6.7% |
| に | 1 | 6.7% |
| ゃ | 1 | 6.7% |
Oriya
| Value | Count | Frequency (%) |
| ୁ | 2 | |
| ଦ | 1 | |
| ା | 1 | |
| ବ | 1 | |
| ନ | 1 | |
| ଆ | 1 | |
| ଶ | 1 |
Bengali
| Value | Count | Frequency (%) |
| ত | 2 | |
| ্ | 2 | |
| ৰ | 1 | |
| ন | 1 | |
| ণ | 1 | |
| গ | 1 | |
| ক | 1 |
Hangul
| Value | Count | Frequency (%) |
| 평 | 2 | |
| 인 | 2 | |
| 도 | 2 | |
| 태 | 2 | |
| 양 | 2 | |
| 좋 | 1 | |
| 네 | 1 |
IPA Ext
| Value | Count | Frequency (%) |
| ə | 1 |
Ethiopic
| Value | Count | Frequency (%) |
| ና | 1 | |
| ይ | 1 | |
| ቻ | 1 | |
| ት | 1 | |
| ዮ | 1 | |
| ጵ | 1 | |
| ያ | 1 | |
| ኢ | 1 |
Block Elements
| Value | Count | Frequency (%) |
| ▁ | 1 |
cashtag
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 415 |
|---|---|
| Distinct (%) | 33.7% |
| Missing | 499478 |
| Missing (%) | 99.8% |
| Memory size | 15.3 MiB |
| MAN | 82 |
|---|---|
| HG_F | 74 |
| v | 73 |
| ZKIN | 42 |
| OBOR | 41 |
| Other values (410) |
Length
| Max length | 142 |
|---|---|
| Median length | 107 |
| Mean length | 7.2678571 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8954 |
|---|---|
| Distinct characters | 55 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 281 ? |
|---|---|
| Unique (%) | 22.8% |
Sample
| 1st row | AZIA |
|---|---|
| 2nd row | MRVL |
| 3rd row | MRVL |
| 4th row | AAPL |
| 5th row | SWX |
Common Values
| Value | Count | Frequency (%) |
| MAN | 82 | < 0.1% |
| HG_F | 74 | < 0.1% |
| v | 73 | < 0.1% |
| ZKIN | 42 | < 0.1% |
| OBOR | 41 | < 0.1% |
| VET | 41 | < 0.1% |
| btc eth ltc xrp vet | 30 | < 0.1% |
| YRIV | 23 | < 0.1% |
| FB TWTR LNKD | 20 | < 0.1% |
| man | 16 | < 0.1% |
| Other values (405) | 790 | 0.2% |
| (Missing) | 499478 |
Length
| Value | Count | Frequency (%) |
| vet | 133 | 5.7% |
| man | 128 | 5.5% |
| v | 81 | 3.5% |
| hg_f | 74 | 3.2% |
| btc | 64 | 2.7% |
| obor | 54 | 2.3% |
| eth | 51 | 2.2% |
| zkin | 42 | 1.8% |
| xrp | 42 | 1.8% |
| spy | 40 | 1.7% |
| Other values (466) | 1625 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1102 | 12.3% | |
| N | 395 | 4.4% |
| T | 384 | 4.3% |
| A | 381 | 4.3% |
| I | 355 | 4.0% |
| R | 351 | 3.9% |
| E | 346 | 3.9% |
| S | 312 | 3.5% |
| B | 311 | 3.5% |
| C | 308 | 3.4% |
| Other values (45) | 4709 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 6226 | |
| Lowercase Letter | 1538 | 17.2% |
| Space Separator | 1102 | 12.3% |
| Connector Punctuation | 75 | 0.8% |
| Other Punctuation | 13 | 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 395 | 6.3% |
| T | 384 | 6.2% |
| A | 381 | 6.1% |
| I | 355 | 5.7% |
| R | 351 | 5.6% |
| E | 346 | 5.6% |
| S | 312 | 5.0% |
| B | 311 | 5.0% |
| C | 308 | 4.9% |
| H | 277 | 4.4% |
| Other values (16) | 2806 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 219 | |
| v | 148 | 9.6% |
| e | 130 | 8.5% |
| c | 111 | 7.2% |
| a | 102 | 6.6% |
| l | 81 | 5.3% |
| b | 79 | 5.1% |
| s | 69 | 4.5% |
| n | 63 | 4.1% |
| r | 62 | 4.0% |
| Other values (16) | 474 |
Space Separator
| Value | Count | Frequency (%) |
| 1102 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 75 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 13 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 7764 | |
| Common | 1190 | 13.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| N | 395 | 5.1% |
| T | 384 | 4.9% |
| A | 381 | 4.9% |
| I | 355 | 4.6% |
| R | 351 | 4.5% |
| E | 346 | 4.5% |
| S | 312 | 4.0% |
| B | 311 | 4.0% |
| C | 308 | 4.0% |
| H | 277 | 3.6% |
| Other values (42) | 4344 |
Common
| Value | Count | Frequency (%) |
| 1102 | ||
| _ | 75 | 6.3% |
| . | 13 | 1.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8954 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1102 | 12.3% | |
| N | 395 | 4.4% |
| T | 384 | 4.3% |
| A | 381 | 4.3% |
| I | 355 | 4.0% |
| R | 351 | 3.9% |
| E | 346 | 3.9% |
| S | 312 | 3.5% |
| B | 311 | 3.5% |
| C | 308 | 3.4% |
| Other values (45) | 4709 |
media
Categorical
HIGH CARDINALITY  MISSING  UNIFORM 
| Distinct | 104798 |
|---|---|
| Distinct (%) | 95.0% |
| Missing | 390417 |
| Missing (%) | 78.0% |
| Memory size | 41.7 MiB |
| [Photo(previewUrl='https://pbs.twimg.com/media/CSpA0IHUYAA_dzA?format=png&name=small', fullUrl='https://pbs.twimg.com/media/CSpA0IHUYAA_dzA?format=png&name=large')] | 65 |
|---|---|
| [Photo(previewUrl='https://pbs.twimg.com/media/D8UmgdpXUAA7eWz?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/D8UmgdpXUAA7eWz?format=jpg&name=large')] | 45 |
| [Video(thumbnailUrl='https://pbs.twimg.com/media/EJ63E2oWsAE7-je.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1184218546018148355/vid/480x270/L-bX98i2bPwh4CiK.mp4?tag=13', bitrate=288000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/amplify_video/1184218546018148355/pl/-MDJ4O6Ig5Fnujnk.m3u8?tag=13', bitrate=None), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1184218546018148355/vid/640x360/FDcBzFJfX7jYGnM5.mp4?tag=13', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1184218546018148355/vid/1280x720/O4PltRA1G43uFMtI.mp4?tag=13', bitrate=2176000)], duration=91.34, views=26789)] | 38 |
| [Video(thumbnailUrl='https://pbs.twimg.com/ext_tw_video_thumb/1003609615488180226/pu/img/0NXi0mnM1YVFzjzc.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/640x360/T9XoBgDvsYgsXsli.mp4?tag=3', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/320x180/ZhikSDLKuEVe9f6-.mp4?tag=3', bitrate=256000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/1280x720/SIdSejWUIl3KTOaQ.mp4?tag=3', bitrate=2176000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/pl/JIxe16BUWnl2CCfG.m3u8?tag=3', bitrate=None)], duration=89.333, views=95996)] | 31 |
| [Video(thumbnailUrl='https://pbs.twimg.com/media/EAvlOyGWsAAY4Aw.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1156266749488246784/vid/1168x656/T_4pIO2hNM8jsCEt.mp4?tag=13', bitrate=2176000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1156266749488246784/vid/640x360/diqhFPtuDXOepno2.mp4?tag=13', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1156266749488246784/vid/480x270/S9VWh7avTPotrazK.mp4?tag=13', bitrate=288000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/amplify_video/1156266749488246784/pl/fm-wy5c_iBqe7hXN.m3u8?tag=13', bitrate=None)], duration=21.3, views=41463)] | 30 |
| Other values (104793) |
Length
| Max length | 1605 |
|---|---|
| Median length | 164 |
| Mean length | 226.31366 |
| Min length | 164 |
Characters and Unicode
| Total characters | 24960812 |
|---|---|
| Distinct characters | 77 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 101423 ? |
|---|---|
| Unique (%) | 92.0% |
Sample
| 1st row | [Photo(previewUrl='https://pbs.twimg.com/media/BTrHn_aCMAAR3kf?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/BTrHn_aCMAAR3kf?format=jpg&name=large')] |
|---|---|
| 2nd row | [Photo(previewUrl='https://pbs.twimg.com/media/BUPwyGFCAAAKWGM?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/BUPwyGFCAAAKWGM?format=jpg&name=large')] |
| 3rd row | [Photo(previewUrl='https://pbs.twimg.com/media/BUR-NnMCAAE-JGQ?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/BUR-NnMCAAE-JGQ?format=jpg&name=large')] |
| 4th row | [Photo(previewUrl='https://pbs.twimg.com/media/BUefpwHCYAAP7pN?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/BUefpwHCYAAP7pN?format=jpg&name=large')] |
| 5th row | [Photo(previewUrl='https://pbs.twimg.com/media/BUnHmU6CcAEEh8Z?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/BUnHmU6CcAEEh8Z?format=jpg&name=large')] |
Common Values
| Value | Count | Frequency (%) |
| [Photo(previewUrl='https://pbs.twimg.com/media/CSpA0IHUYAA_dzA?format=png&name=small', fullUrl='https://pbs.twimg.com/media/CSpA0IHUYAA_dzA?format=png&name=large')] | 65 | < 0.1% |
| [Photo(previewUrl='https://pbs.twimg.com/media/D8UmgdpXUAA7eWz?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/D8UmgdpXUAA7eWz?format=jpg&name=large')] | 45 | < 0.1% |
| [Video(thumbnailUrl='https://pbs.twimg.com/media/EJ63E2oWsAE7-je.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1184218546018148355/vid/480x270/L-bX98i2bPwh4CiK.mp4?tag=13', bitrate=288000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/amplify_video/1184218546018148355/pl/-MDJ4O6Ig5Fnujnk.m3u8?tag=13', bitrate=None), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1184218546018148355/vid/640x360/FDcBzFJfX7jYGnM5.mp4?tag=13', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1184218546018148355/vid/1280x720/O4PltRA1G43uFMtI.mp4?tag=13', bitrate=2176000)], duration=91.34, views=26789)] | 38 | < 0.1% |
| [Video(thumbnailUrl='https://pbs.twimg.com/ext_tw_video_thumb/1003609615488180226/pu/img/0NXi0mnM1YVFzjzc.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/640x360/T9XoBgDvsYgsXsli.mp4?tag=3', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/320x180/ZhikSDLKuEVe9f6-.mp4?tag=3', bitrate=256000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/1280x720/SIdSejWUIl3KTOaQ.mp4?tag=3', bitrate=2176000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/ext_tw_video/1003609615488180226/pu/pl/JIxe16BUWnl2CCfG.m3u8?tag=3', bitrate=None)], duration=89.333, views=95996)] | 31 | < 0.1% |
| [Video(thumbnailUrl='https://pbs.twimg.com/media/EAvlOyGWsAAY4Aw.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1156266749488246784/vid/1168x656/T_4pIO2hNM8jsCEt.mp4?tag=13', bitrate=2176000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1156266749488246784/vid/640x360/diqhFPtuDXOepno2.mp4?tag=13', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1156266749488246784/vid/480x270/S9VWh7avTPotrazK.mp4?tag=13', bitrate=288000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/amplify_video/1156266749488246784/pl/fm-wy5c_iBqe7hXN.m3u8?tag=13', bitrate=None)], duration=21.3, views=41463)] | 30 | < 0.1% |
| [Photo(previewUrl='https://pbs.twimg.com/media/Clhyx46UgAQ4kxq?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/Clhyx46UgAQ4kxq?format=jpg&name=large')] | 26 | < 0.1% |
| [Photo(previewUrl='https://pbs.twimg.com/media/DJF4bYKUMAAaFHc?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/DJF4bYKUMAAaFHc?format=jpg&name=large')] | 24 | < 0.1% |
| [Video(thumbnailUrl='https://pbs.twimg.com/media/D8Z9SQ2XkAACelN.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1136733439720337408/vid/640x360/oXlcm9mPn3ILwVpF.mp4?tag=13', bitrate=832000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1136733439720337408/vid/480x270/-CXyBmUcw1_MND8W.mp4?tag=13', bitrate=288000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1136733439720337408/vid/1280x720/yTSHxuDrasbSM92H.mp4?tag=13', bitrate=2176000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/amplify_video/1136733439720337408/pl/iPJLyN2uhNG1THwx.m3u8?tag=13', bitrate=None)], duration=92.14, views=24300)] | 24 | < 0.1% |
| [Video(thumbnailUrl='https://pbs.twimg.com/media/D40bercUYAA9-d-.jpg', variants=[VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/amplify_video/1120581640512606208/pl/xhGe2nJ6_-0cTojL.m3u8?tag=11', bitrate=None), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1120581640512606208/vid/320x180/N8SF_arpZmDVwPdA.mp4?tag=11', bitrate=288000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/amplify_video/1120581640512606208/vid/640x360/84-tXOQAcXJVsG0Q.mp4?tag=11', bitrate=832000)], duration=389.12, views=22715)] | 21 | < 0.1% |
| [Photo(previewUrl='https://pbs.twimg.com/media/B5ANn_iIgAAdMIw?format=jpg&name=small', fullUrl='https://pbs.twimg.com/media/B5ANn_iIgAAdMIw?format=jpg&name=large')] | 20 | < 0.1% |
| Other values (104788) | 109969 | 22.0% |
| (Missing) | 390417 |
Length
| Value | Count | Frequency (%) |
| videovariant(contenttype='video/mp4 | 11374 | 3.4% |
| bitrate=none | 5641 | 1.7% |
| bitrate=832000 | 5300 | 1.6% |
| variants=[videovariant(contenttype='video/mp4 | 4995 | 1.5% |
| videovariant(contenttype='application/x-mpegurl | 4137 | 1.2% |
| bitrate=2176000 | 4042 | 1.2% |
| bitrate=256000 | 2960 | 0.9% |
| variants=[videovariant(contenttype='application/x-mpegurl | 1504 | 0.4% |
| bitrate=288000 | 1419 | 0.4% |
| bitrate=0 | 858 | 0.3% |
| Other values (269781) | 292601 |
Most occurring characters
| Value | Count | Frequency (%) |
| m | 1555076 | 6.2% |
| t | 1452814 | 5.8% |
| / | 1223179 | 4.9% |
| a | 1179841 | 4.7% |
| e | 1120571 | 4.5% |
| p | 1068385 | 4.3% |
| l | 980944 | 3.9% |
| o | 975644 | 3.9% |
| i | 907766 | 3.6% |
| r | 883673 | 3.5% |
| Other values (67) | 13612919 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 15629193 | |
| Other Punctuation | 3443147 | 13.8% |
| Uppercase Letter | 2758705 | 11.1% |
| Decimal Number | 1336692 | 5.4% |
| Math Symbol | 860945 | 3.4% |
| Close Punctuation | 270811 | 1.1% |
| Open Punctuation | 270811 | 1.1% |
| Space Separator | 224538 | 0.9% |
| Connector Punctuation | 113291 | 0.5% |
| Dash Punctuation | 52679 | 0.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| m | 1555076 | 9.9% |
| t | 1452814 | 9.3% |
| a | 1179841 | 7.5% |
| e | 1120571 | 7.2% |
| p | 1068385 | 6.8% |
| l | 980944 | 6.3% |
| o | 975644 | 6.2% |
| i | 907766 | 5.8% |
| r | 883673 | 5.7% |
| g | 746556 | 4.8% |
| Other values (16) | 4757923 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 465035 | |
| U | 403327 | |
| E | 183461 | 6.7% |
| D | 175417 | 6.4% |
| P | 169761 | 6.2% |
| V | 137472 | 5.0% |
| X | 117600 | 4.3% |
| W | 111486 | 4.0% |
| C | 98474 | 3.6% |
| I | 76173 | 2.8% |
| Other values (16) | 820499 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 205364 | |
| 4 | 169476 | |
| 1 | 149097 | |
| 2 | 138252 | |
| 8 | 131725 | |
| 3 | 122459 | |
| 6 | 118462 | |
| 5 | 105880 | |
| 7 | 101750 | |
| 9 | 94227 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 1223179 | |
| ' | 603078 | |
| . | 593208 | |
| : | 279529 | 8.1% |
| ? | 268126 | 7.8% |
| & | 251489 | 7.3% |
| , | 224538 | 6.5% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 154019 | |
| ] | 116792 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 154019 | |
| [ | 116792 |
Math Symbol
| Value | Count | Frequency (%) |
| = | 860945 |
Space Separator
| Value | Count | Frequency (%) |
| 224538 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 113291 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 52679 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 18387898 | |
| Common | 6572914 | 26.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| m | 1555076 | 8.5% |
| t | 1452814 | 7.9% |
| a | 1179841 | 6.4% |
| e | 1120571 | 6.1% |
| p | 1068385 | 5.8% |
| l | 980944 | 5.3% |
| o | 975644 | 5.3% |
| i | 907766 | 4.9% |
| r | 883673 | 4.8% |
| g | 746556 | 4.1% |
| Other values (42) | 7516628 |
Common
| Value | Count | Frequency (%) |
| / | 1223179 | |
| = | 860945 | |
| ' | 603078 | 9.2% |
| . | 593208 | 9.0% |
| : | 279529 | 4.3% |
| ? | 268126 | 4.1% |
| & | 251489 | 3.8% |
| , | 224538 | 3.4% |
| 224538 | 3.4% | |
| 0 | 205364 | 3.1% |
| Other values (15) | 1838920 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24960812 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| m | 1555076 | 6.2% |
| t | 1452814 | 5.8% |
| / | 1223179 | 4.9% |
| a | 1179841 | 4.7% |
| e | 1120571 | 4.5% |
| p | 1068385 | 4.3% |
| l | 980944 | 3.9% |
| o | 975644 | 3.9% |
| i | 907766 | 3.6% |
| r | 883673 | 3.5% |
| Other values (67) | 13612919 |
image_url
URL
| Distinct | 99196 |
|---|---|
| Distinct (%) | 95.6% |
| Missing | 396933 |
| Missing (%) | 79.3% |
| Memory size | 25.6 MiB |
| https://pbs.twimg.com/media/CSpA0IHUYAA_dzA?format=png&name=large | 65 |
|---|---|
| https://pbs.twimg.com/media/D8UmgdpXUAA7eWz?format=jpg&name=large | 45 |
| https://pbs.twimg.com/media/Clhyx46UgAQ4kxq?format=jpg&name=large | 26 |
| https://pbs.twimg.com/media/DJF4bYKUMAAaFHc?format=jpg&name=large | 24 |
| https://pbs.twimg.com/media/B5ANn_iIgAAdMIw?format=jpg&name=large | 20 |
| Other values (99191) | |
| (Missing) |
| Value | Count | Frequency (%) |
| https://pbs.twimg.com/media/CSpA0IHUYAA_dzA?format=png&name=large | 65 | < 0.1% |
| https://pbs.twimg.com/media/D8UmgdpXUAA7eWz?format=jpg&name=large | 45 | < 0.1% |
| https://pbs.twimg.com/media/Clhyx46UgAQ4kxq?format=jpg&name=large | 26 | < 0.1% |
| https://pbs.twimg.com/media/DJF4bYKUMAAaFHc?format=jpg&name=large | 24 | < 0.1% |
| https://pbs.twimg.com/media/B5ANn_iIgAAdMIw?format=jpg&name=large | 20 | < 0.1% |
| https://pbs.twimg.com/media/DmGZINgUYAUeixn?format=jpg&name=large | 19 | < 0.1% |
| https://pbs.twimg.com/media/DOlbpuUX0AEQmYk?format=jpg&name=large | 15 | < 0.1% |
| https://pbs.twimg.com/media/DG9OtPrWsAAAlBk?format=jpg&name=large | 13 | < 0.1% |
| https://pbs.twimg.com/media/DECeGFwW0AAr_XT?format=jpg&name=large | 13 | < 0.1% |
| https://pbs.twimg.com/media/DF4mJalWAAQOTjw?format=jpg&name=large | 13 | < 0.1% |
| Other values (99186) | 103524 | 20.7% |
| (Missing) | 396933 |
| Value | Count | Frequency (%) |
| https | 103777 | 20.7% |
| (Missing) | 396933 |
| Value | Count | Frequency (%) |
| pbs.twimg.com | 103777 | 20.7% |
| (Missing) | 396933 |
| Value | Count | Frequency (%) |
| /media/CSpA0IHUYAA_dzA | 65 | < 0.1% |
| /media/D8UmgdpXUAA7eWz | 45 | < 0.1% |
| /media/Clhyx46UgAQ4kxq | 26 | < 0.1% |
| /media/DJF4bYKUMAAaFHc | 24 | < 0.1% |
| /media/B5ANn_iIgAAdMIw | 20 | < 0.1% |
| /media/DmGZINgUYAUeixn | 19 | < 0.1% |
| /media/DOlbpuUX0AEQmYk | 15 | < 0.1% |
| /media/DECeGFwW0AAr_XT | 13 | < 0.1% |
| /media/DFkpUm-VwAAy5hx | 13 | < 0.1% |
| /media/DDPgqODXoAE9whg | 13 | < 0.1% |
| Other values (99185) | 103524 | 20.7% |
| (Missing) | 396933 |
| Value | Count | Frequency (%) |
| format=jpg&name=large | 87779 | 17.5% |
| format=png&name=large | 4912 | 1.0% |
| format=jpg&name=large https://pbs.twimg.com/media/C_xoUKVVwAAcDr7?format=jpg&name=large https://pbs.twimg.com/media/C_xoUKXUMAAw5l3?format=jpg&name=large | 10 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/DLRTxEuUEAAYV3p?format=jpg&name=large https://pbs.twimg.com/media/DLRTxalUIAAUBZ-?format=jpg&name=large | 9 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/DLRbaM_UEAAjX1a?format=jpg&name=large | 9 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/C_ytNgqXkAAdVDO?format=jpg&name=large | 9 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/Bu7tHVtIUAE4u55?format=jpg&name=large | 8 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/Da-0hj5VMAA0ztg?format=jpg&name=large https://pbs.twimg.com/media/Da-0jlHVAAAi9Ie?format=jpg&name=large https://pbs.twimg.com/media/Da-0ltsU8AIZuco?format=jpg&name=large | 7 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/Cwj0DaRXEAA71Um?format=jpg&name=large | 6 | < 0.1% |
| format=jpg&name=large https://pbs.twimg.com/media/D416neXWAAAW6bo?format=jpg&name=large https://pbs.twimg.com/media/D416neVWwAA1yOK?format=jpg&name=large https://pbs.twimg.com/media/D416ngMW4AEV1Gc?format=jpg&name=large | 6 | < 0.1% |
| Other values (10661) | 11022 | 2.2% |
| (Missing) | 396933 |
| Value | Count | Frequency (%) |
| 103777 | 20.7% | |
| (Missing) | 396933 |
video_url
URL
| Distinct | 4744 |
|---|---|
| Distinct (%) | 84.2% |
| Missing | 495073 |
| Missing (%) | 98.9% |
| Memory size | 15.9 MiB |
| https://video.twimg.com/amplify_video/1184218546018148355/vid/1280x720/O4PltRA1G43uFMtI.mp4?tag=13 | 38 |
|---|---|
| https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/1280x720/SIdSejWUIl3KTOaQ.mp4?tag=3 | 31 |
| https://video.twimg.com/amplify_video/1156266749488246784/vid/1168x656/T_4pIO2hNM8jsCEt.mp4?tag=13 | 30 |
| https://video.twimg.com/amplify_video/1136733439720337408/vid/1280x720/yTSHxuDrasbSM92H.mp4?tag=13 | 24 |
| https://video.twimg.com/amplify_video/1120581640512606208/vid/640x360/84-tXOQAcXJVsG0Q.mp4?tag=11 | 21 |
| Other values (4739) | 5493 |
| (Missing) |
| Value | Count | Frequency (%) |
| https://video.twimg.com/amplify_video/1184218546018148355/vid/1280x720/O4PltRA1G43uFMtI.mp4?tag=13 | 38 | < 0.1% |
| https://video.twimg.com/ext_tw_video/1003609615488180226/pu/vid/1280x720/SIdSejWUIl3KTOaQ.mp4?tag=3 | 31 | < 0.1% |
| https://video.twimg.com/amplify_video/1156266749488246784/vid/1168x656/T_4pIO2hNM8jsCEt.mp4?tag=13 | 30 | < 0.1% |
| https://video.twimg.com/amplify_video/1136733439720337408/vid/1280x720/yTSHxuDrasbSM92H.mp4?tag=13 | 24 | < 0.1% |
| https://video.twimg.com/amplify_video/1120581640512606208/vid/640x360/84-tXOQAcXJVsG0Q.mp4?tag=11 | 21 | < 0.1% |
| https://video.twimg.com/amplify_video/1009755756718100480/vid/1280x720/h_qCyWfNIn9lkKRg.mp4?tag=2 | 14 | < 0.1% |
| https://video.twimg.com/ext_tw_video/826411685716119553/pu/vid/720x720/CY6ZuhWHvvH5-LF7.mp4 | 14 | < 0.1% |
| https://video.twimg.com/amplify_video/1103613439702908933/vid/1280x720/jS1WuEbm7noMgzDQ.mp4?tag=9 | 11 | < 0.1% |
| https://video.twimg.com/amplify_video/1121616545199742976/vid/1280x720/jee5ElylvJ4z8znL.mp4?tag=11 | 11 | < 0.1% |
| https://video.twimg.com/ext_tw_video/981500388368158721/pu/vid/720x720/Db4h3kCuBZpGw5me.mp4?tag=2 | 10 | < 0.1% |
| Other values (4734) | 5433 | 1.1% |
| (Missing) | 495073 |
| Value | Count | Frequency (%) |
| https | 5637 | 1.1% |
| (Missing) | 495073 |
| Value | Count | Frequency (%) |
| video.twimg.com | 5637 | 1.1% |
| (Missing) | 495073 |
| Value | Count | Frequency (%) |
| /amplify_video/1184218546018148355/vid/1280x720/O4PltRA1G43uFMtI.mp4 | 38 | < 0.1% |
| /ext_tw_video/1003609615488180226/pu/vid/1280x720/SIdSejWUIl3KTOaQ.mp4 | 31 | < 0.1% |
| /amplify_video/1156266749488246784/vid/1168x656/T_4pIO2hNM8jsCEt.mp4 | 30 | < 0.1% |
| /amplify_video/1136733439720337408/vid/1280x720/yTSHxuDrasbSM92H.mp4 | 24 | < 0.1% |
| /amplify_video/1120581640512606208/vid/640x360/84-tXOQAcXJVsG0Q.mp4 | 21 | < 0.1% |
| /ext_tw_video/826411685716119553/pu/vid/720x720/CY6ZuhWHvvH5-LF7.mp4 | 14 | < 0.1% |
| /amplify_video/1009755756718100480/vid/1280x720/h_qCyWfNIn9lkKRg.mp4 | 14 | < 0.1% |
| /amplify_video/1103613439702908933/vid/1280x720/jS1WuEbm7noMgzDQ.mp4 | 11 | < 0.1% |
| /amplify_video/1121616545199742976/vid/1280x720/jee5ElylvJ4z8znL.mp4 | 11 | < 0.1% |
| /ext_tw_video/981500388368158721/pu/vid/720x720/Db4h3kCuBZpGw5me.mp4 | 10 | < 0.1% |
| Other values (4734) | 5433 | 1.1% |
| (Missing) | 495073 |
| Value | Count | Frequency (%) |
| 1120 | 0.2% | |
| tag=10 | 1024 | 0.2% |
| tag=12 | 734 | 0.1% |
| tag=8 | 665 | 0.1% |
| tag=13 | 507 | 0.1% |
| tag=11 | 396 | 0.1% |
| tag=5 | 264 | 0.1% |
| tag=9 | 254 | 0.1% |
| tag=14 | 201 | < 0.1% |
| tag=3 | 192 | < 0.1% |
| Other values (4) | 280 | 0.1% |
| (Missing) | 495073 |
| Value | Count | Frequency (%) |
| 5637 | 1.1% | |
| (Missing) | 495073 |
GIF_url
URL
| Distinct | 837 |
|---|---|
| Distinct (%) | 97.6% |
| Missing | 499852 |
| Missing (%) | 99.8% |
| Memory size | 15.3 MiB |
| https://video.twimg.com/tweet_video/D--tSrjX4AEGStU.mp4 | 10 |
|---|---|
| https://video.twimg.com/tweet_video/DqRPm_FXcAAXX-p.mp4 | 3 |
| https://video.twimg.com/tweet_video/Dc1ma9VW4AAp1SY.mp4 | 3 |
| https://video.twimg.com/tweet_video/CsVQWuDWYAAZN9_.mp4 | 3 |
| https://video.twimg.com/tweet_video/DNmyBgjVoAAowjv.mp4 | 2 |
| Other values (832) | 837 |
| (Missing) |
| Value | Count | Frequency (%) |
| https://video.twimg.com/tweet_video/D--tSrjX4AEGStU.mp4 | 10 | < 0.1% |
| https://video.twimg.com/tweet_video/DqRPm_FXcAAXX-p.mp4 | 3 | < 0.1% |
| https://video.twimg.com/tweet_video/Dc1ma9VW4AAp1SY.mp4 | 3 | < 0.1% |
| https://video.twimg.com/tweet_video/CsVQWuDWYAAZN9_.mp4 | 3 | < 0.1% |
| https://video.twimg.com/tweet_video/DNmyBgjVoAAowjv.mp4 | 2 | < 0.1% |
| https://video.twimg.com/tweet_video/DQsnr-dXkAAoz_b.mp4 | 2 | < 0.1% |
| https://video.twimg.com/tweet_video/C_YGhKPXUAEvDmh.mp4 | 2 | < 0.1% |
| https://video.twimg.com/tweet_video/Dkd5VGAX0AIsqIQ.mp4 | 2 | < 0.1% |
| https://video.twimg.com/tweet_video/D40scJvUYAIg6CR.mp4 | 2 | < 0.1% |
| https://video.twimg.com/tweet_video/Cite_WPUkAEXgal.mp4 | 2 | < 0.1% |
| Other values (827) | 827 | 0.2% |
| (Missing) | 499852 |
| Value | Count | Frequency (%) |
| https | 858 | 0.2% |
| (Missing) | 499852 |
| Value | Count | Frequency (%) |
| video.twimg.com | 858 | 0.2% |
| (Missing) | 499852 |
| Value | Count | Frequency (%) |
| /tweet_video/D--tSrjX4AEGStU.mp4 | 10 | < 0.1% |
| /tweet_video/DqRPm_FXcAAXX-p.mp4 | 3 | < 0.1% |
| /tweet_video/Dc1ma9VW4AAp1SY.mp4 | 3 | < 0.1% |
| /tweet_video/CsVQWuDWYAAZN9_.mp4 | 3 | < 0.1% |
| /tweet_video/DNmyBgjVoAAowjv.mp4 | 2 | < 0.1% |
| /tweet_video/DQsnr-dXkAAoz_b.mp4 | 2 | < 0.1% |
| /tweet_video/C_YGhKPXUAEvDmh.mp4 | 2 | < 0.1% |
| /tweet_video/Dkd5VGAX0AIsqIQ.mp4 | 2 | < 0.1% |
| /tweet_video/D40scJvUYAIg6CR.mp4 | 2 | < 0.1% |
| /tweet_video/Cite_WPUkAEXgal.mp4 | 2 | < 0.1% |
| Other values (827) | 827 | 0.2% |
| (Missing) | 499852 |
| Value | Count | Frequency (%) |
| 858 | 0.2% | |
| (Missing) | 499852 |
| Value | Count | Frequency (%) |
| 858 | 0.2% | |
| (Missing) | 499852 |
likes
Real number (ℝ)
SKEWED  ZEROS 
| Distinct | 921 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.9468355 |
| Minimum | 0 |
|---|---|
| Maximum | 23288 |
| Zeros | 314331 |
| Zeros (%) | 62.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 13 |
| Maximum | 23288 |
| Range | 23288 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 71.188337 |
|---|---|
| Coefficient of variation (CV) | 14.390682 |
| Kurtosis | 38111.048 |
| Mean | 4.9468355 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 143.3288 |
| Sum | 2476930 |
| Variance | 5067.7794 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 314331 | |
| 1 | 71118 | 14.2% |
| 2 | 30286 | 6.0% |
| 3 | 17210 | 3.4% |
| 4 | 11075 | 2.2% |
| 5 | 7884 | 1.6% |
| 6 | 5999 | 1.2% |
| 7 | 4641 | 0.9% |
| 8 | 3802 | 0.8% |
| 9 | 2926 | 0.6% |
| Other values (911) | 31438 | 6.3% |
| Value | Count | Frequency (%) |
| 0 | 314331 | |
| 1 | 71118 | 14.2% |
| 2 | 30286 | 6.0% |
| 3 | 17210 | 3.4% |
| 4 | 11075 | 2.2% |
| 5 | 7884 | 1.6% |
| 6 | 5999 | 1.2% |
| 7 | 4641 | 0.9% |
| 8 | 3802 | 0.8% |
| 9 | 2926 | 0.6% |
| Value | Count | Frequency (%) |
| 23288 | 1 | |
| 20490 | 1 | |
| 7020 | 1 | |
| 7012 | 1 | |
| 5868 | 1 | |
| 5631 | 1 | |
| 5456 | 1 | |
| 5284 | 1 | |
| 5233 | 1 | |
| 5214 | 1 |
retweets
Real number (ℝ)
SKEWED  ZEROS 
| Distinct | 450 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.9416009 |
| Minimum | 0 |
|---|---|
| Maximum | 7505 |
| Zeros | 359328 |
| Zeros (%) | 71.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 7 |
| Maximum | 7505 |
| Range | 7505 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 23.135042 |
|---|---|
| Coefficient of variation (CV) | 11.915446 |
| Kurtosis | 45718.138 |
| Mean | 1.9416009 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 167.99774 |
| Sum | 972179 |
| Variance | 535.23016 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 359328 | |
| 1 | 56898 | 11.4% |
| 2 | 24696 | 4.9% |
| 3 | 13963 | 2.8% |
| 4 | 8765 | 1.8% |
| 5 | 6157 | 1.2% |
| 6 | 4627 | 0.9% |
| 7 | 3425 | 0.7% |
| 8 | 2688 | 0.5% |
| 9 | 2198 | 0.4% |
| Other values (440) | 17965 | 3.6% |
| Value | Count | Frequency (%) |
| 0 | 359328 | |
| 1 | 56898 | 11.4% |
| 2 | 24696 | 4.9% |
| 3 | 13963 | 2.8% |
| 4 | 8765 | 1.8% |
| 5 | 6157 | 1.2% |
| 6 | 4627 | 0.9% |
| 7 | 3425 | 0.7% |
| 8 | 2688 | 0.5% |
| 9 | 2198 | 0.4% |
| Value | Count | Frequency (%) |
| 7505 | 1 | |
| 6871 | 1 | |
| 5576 | 1 | |
| 2866 | 1 | |
| 2457 | 1 | |
| 2015 | 1 | |
| 1783 | 1 | |
| 1747 | 1 | |
| 1461 | 1 | |
| 1453 | 1 |
replies
Real number (ℝ)
SKEWED  ZEROS 
| Distinct | 175 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.39902738 |
| Minimum | 0 |
|---|---|
| Maximum | 4975 |
| Zeros | 428227 |
| Zeros (%) | 85.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 4975 |
| Range | 4975 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 8.7456224 |
|---|---|
| Coefficient of variation (CV) | 21.917349 |
| Kurtosis | 215275.06 |
| Mean | 0.39902738 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 403.27883 |
| Sum | 199797 |
| Variance | 76.485912 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 428227 | |
| 1 | 49537 | 9.9% |
| 2 | 10234 | 2.0% |
| 3 | 4052 | 0.8% |
| 4 | 2182 | 0.4% |
| 5 | 1432 | 0.3% |
| 6 | 968 | 0.2% |
| 7 | 691 | 0.1% |
| 8 | 490 | 0.1% |
| 9 | 351 | 0.1% |
| Other values (165) | 2546 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 428227 | |
| 1 | 49537 | 9.9% |
| 2 | 10234 | 2.0% |
| 3 | 4052 | 0.8% |
| 4 | 2182 | 0.4% |
| 5 | 1432 | 0.3% |
| 6 | 968 | 0.2% |
| 7 | 691 | 0.1% |
| 8 | 490 | 0.1% |
| 9 | 351 | 0.1% |
| Value | Count | Frequency (%) |
| 4975 | 1 | |
| 1966 | 1 | |
| 1077 | 1 | |
| 891 | 1 | |
| 783 | 1 | |
| 731 | 1 | |
| 638 | 1 | |
| 586 | 1 | |
| 563 | 1 | |
| 459 | 1 |
reply_to_user
Unsupported
MISSING  REJECTED  UNSUPPORTED 
| Missing | 433060 |
|---|---|
| Missing (%) | 86.5% |
| Memory size | 15.3 MiB |
mentioned_users
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 62698 |
|---|---|
| Distinct (%) | 41.0% |
| Missing | 347742 |
| Missing (%) | 69.4% |
| Memory size | 21.8 MiB |
| 10228272 | 6343 |
|---|---|
| 23922797 | 2117 |
| 995000000000000000 | 1454 |
| 39922594 | 1427 |
| 487118986 | 1279 |
| Other values (62693) |
Length
| Max length | 697 |
|---|---|
| Median length | 691 |
| Mean length | 19.968392 |
| Min length | 2 |
Characters and Unicode
| Total characters | 3054525 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 52139 ? |
|---|---|
| Unique (%) | 34.1% |
Sample
| 1st row | 83521919 |
|---|---|
| 2nd row | 1050000000000000000 |
| 3rd row | 1151681138 |
| 4th row | 1050000000000000000 |
| 5th row | 636689752 25159286 28109695 93540572 55725588 1348980000000000000 21425858 477017537 |
Common Values
| Value | Count | Frequency (%) |
| 10228272 | 6343 | 1.3% |
| 23922797 | 2117 | 0.4% |
| 995000000000000000 | 1454 | 0.3% |
| 39922594 | 1427 | 0.3% |
| 487118986 | 1279 | 0.3% |
| 4898091 | 1211 | 0.2% |
| 18949452 | 1149 | 0.2% |
| 1652541 | 945 | 0.2% |
| 5120691 | 945 | 0.2% |
| 91478624 | 909 | 0.2% |
| Other values (62688) | 135189 | 27.0% |
| (Missing) | 347742 |
Length
| Value | Count | Frequency (%) |
| 10228272 | 6670 | 2.4% |
| 23922797 | 2389 | 0.9% |
| 141627220 | 2023 | 0.7% |
| 487118986 | 1952 | 0.7% |
| 228535666 | 1932 | 0.7% |
| 995000000000000000 | 1874 | 0.7% |
| 39922594 | 1576 | 0.6% |
| 18949452 | 1489 | 0.5% |
| 4898091 | 1395 | 0.5% |
| 852000000000000000 | 1343 | 0.5% |
| Other values (60267) | 250987 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 974104 | |
| 1 | 286663 | 9.4% |
| 2 | 277271 | 9.1% |
| 3 | 208661 | 6.8% |
| 8 | 207345 | 6.8% |
| 9 | 206142 | 6.7% |
| 4 | 204800 | 6.7% |
| 7 | 197967 | 6.5% |
| 6 | 187548 | 6.1% |
| 5 | 183362 | 6.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2933863 | |
| Space Separator | 120662 | 4.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 974104 | |
| 1 | 286663 | 9.8% |
| 2 | 277271 | 9.5% |
| 3 | 208661 | 7.1% |
| 8 | 207345 | 7.1% |
| 9 | 206142 | 7.0% |
| 4 | 204800 | 7.0% |
| 7 | 197967 | 6.7% |
| 6 | 187548 | 6.4% |
| 5 | 183362 | 6.2% |
Space Separator
| Value | Count | Frequency (%) |
| 120662 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3054525 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 974104 | |
| 1 | 286663 | 9.4% |
| 2 | 277271 | 9.1% |
| 3 | 208661 | 6.8% |
| 8 | 207345 | 6.8% |
| 9 | 206142 | 6.7% |
| 4 | 204800 | 6.7% |
| 7 | 197967 | 6.5% |
| 6 | 187548 | 6.1% |
| 5 | 183362 | 6.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3054525 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 974104 | |
| 1 | 286663 | 9.4% |
| 2 | 277271 | 9.1% |
| 3 | 208661 | 6.8% |
| 8 | 207345 | 6.8% |
| 9 | 206142 | 6.7% |
| 4 | 204800 | 6.7% |
| 7 | 197967 | 6.5% |
| 6 | 187548 | 6.1% |
| 5 | 183362 | 6.0% |
quoted_tweet
Real number (ℝ)
| Distinct | 26664 |
|---|---|
| Distinct (%) | 89.3% |
| Missing | 470842 |
| Missing (%) | 94.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.127953 × 1018 |
| Minimum | 1.2821832 × 1017 |
|---|---|
| Maximum | 1.4654097 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1.2821832 × 1017 |
|---|---|
| 5-th percentile | 8.2930513 × 1017 |
| Q1 | 9.7814811 × 1017 |
| median | 1.1221345 × 1018 |
| Q3 | 1.2869978 × 1018 |
| 95-th percentile | 1.4300713 × 1018 |
| Maximum | 1.4654097 × 1018 |
| Range | 1.3371914 × 1018 |
| Interquartile range (IQR) | 3.0884974 × 1017 |
Descriptive statistics
| Standard deviation | 1.9662638 × 1017 |
|---|---|
| Coefficient of variation (CV) | 0.17432143 |
| Kurtosis | -0.78782612 |
| Mean | 1.127953 × 1018 |
| Median Absolute Deviation (MAD) | 1.5599015 × 1017 |
| Skewness | -0.17659912 |
| Sum | 3.3689701 × 1022 |
| Variance | 3.8661934 × 1034 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.464152042 × 1018 | 109 | < 0.1% |
| 1.382246135 × 1018 | 38 | < 0.1% |
| 1.382329397 × 1018 | 24 | < 0.1% |
| 1.102405637 × 1018 | 23 | < 0.1% |
| 1.217866607 × 1018 | 23 | < 0.1% |
| 1.051216039 × 1018 | 22 | < 0.1% |
| 9.047489028 × 1017 | 19 | < 0.1% |
| 1.19248964 × 1018 | 19 | < 0.1% |
| 1.433843416 × 1018 | 15 | < 0.1% |
| 1.384821549 × 1018 | 13 | < 0.1% |
| Other values (26654) | 29563 | 5.9% |
| (Missing) | 470842 |
| Value | Count | Frequency (%) |
| 1.282183151 × 1017 | 1 | |
| 4.33934586 × 1017 | 1 | |
| 5.121401483 × 1017 | 1 | |
| 5.457642099 × 1017 | 1 | |
| 5.482292164 × 1017 | 1 | |
| 5.632395412 × 1017 | 1 | |
| 5.65683414 × 1017 | 1 | |
| 5.755524979 × 1017 | 1 | |
| 5.812960695 × 1017 | 1 | |
| 5.857120753 × 1017 | 1 |
| Value | Count | Frequency (%) |
| 1.465409744 × 1018 | 1 | |
| 1.465384647 × 1018 | 1 | |
| 1.465370739 × 1018 | 1 | |
| 1.465369374 × 1018 | 1 | |
| 1.465368831 × 1018 | 1 | |
| 1.465338616 × 1018 | 1 | |
| 1.465330744 × 1018 | 1 | |
| 1.465329894 × 1018 | 1 | |
| 1.465317604 × 1018 | 1 | |
| 1.465300854 × 1018 | 1 |
quoted_by_count
Real number (ℝ)
SKEWED  ZEROS 
| Distinct | 113 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.20444768 |
| Minimum | 0 |
|---|---|
| Maximum | 1953 |
| Zeros | 460723 |
| Zeros (%) | 92.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 1953 |
| Range | 1953 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 5.5307667 |
|---|---|
| Coefficient of variation (CV) | 27.052234 |
| Kurtosis | 85322.42 |
| Mean | 0.20444768 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 269.45985 |
| Sum | 102369 |
| Variance | 30.58938 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 460723 | |
| 1 | 25850 | 5.2% |
| 2 | 6526 | 1.3% |
| 3 | 2771 | 0.6% |
| 4 | 1440 | 0.3% |
| 5 | 862 | 0.2% |
| 6 | 528 | 0.1% |
| 7 | 393 | 0.1% |
| 8 | 311 | 0.1% |
| 9 | 233 | < 0.1% |
| Other values (103) | 1073 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 460723 | |
| 1 | 25850 | 5.2% |
| 2 | 6526 | 1.3% |
| 3 | 2771 | 0.6% |
| 4 | 1440 | 0.3% |
| 5 | 862 | 0.2% |
| 6 | 528 | 0.1% |
| 7 | 393 | 0.1% |
| 8 | 311 | 0.1% |
| 9 | 233 | < 0.1% |
| Value | Count | Frequency (%) |
| 1953 | 1 | |
| 1914 | 1 | |
| 1738 | 1 | |
| 1134 | 1 | |
| 895 | 1 | |
| 847 | 1 | |
| 377 | 1 | |
| 335 | 1 | |
| 327 | 1 | |
| 326 | 1 |
credibility
Categorical
IMBALANCE  MISSING 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 349025 |
| Missing (%) | 69.7% |
| Memory size | 22.0 MiB |
| 1.0 | |
|---|---|
| 0.0 | 5703 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 455055 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 145982 | |
| 0.0 | 5703 | 1.1% |
| (Missing) | 349025 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 145982 | |
| 0.0 | 5703 | 3.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 157388 | |
| . | 151685 | |
| 1 | 145982 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 303370 | |
| Other Punctuation | 151685 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 157388 | |
| 1 | 145982 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 151685 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 455055 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 157388 | |
| . | 151685 | |
| 1 | 145982 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 455055 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 157388 | |
| . | 151685 | |
| 1 | 145982 |
tweet_source
Categorical
HIGH CARDINALITY  IMBALANCE 
| Distinct | 3579 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 34.2 MiB |
| Twitter Web Client | |
|---|---|
| Twitter for Android | |
| Twitter Web App | |
| Twitter for iPhone | |
| IFTTT | |
| Other values (3574) |
Length
| Max length | 32 |
|---|---|
| Median length | 30 |
| Mean length | 14.489501 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7255038 |
|---|---|
| Distinct characters | 253 |
| Distinct categories | 15 ? |
| Distinct scripts | 10 ? |
| Distinct blocks | 9 ? |
Unique
| Unique | 1619 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | twitterfeed |
|---|---|
| 2nd row | twitterfeed |
| 3rd row | Twitter for Websites |
| 4th row | Twitter Web Client |
| 5th row | Hootsuite |
Common Values
| Value | Count | Frequency (%) |
| Twitter Web Client | 105526 | |
| Twitter for Android | 65309 | |
| Twitter Web App | 63659 | |
| Twitter for iPhone | 60734 | |
| IFTTT | 24950 | 5.0% |
| TweetDeck | 19819 | 4.0% |
| dlvr.it | 17952 | 3.6% |
| Buffer | 13977 | 2.8% |
| 13027 | 2.6% | |
| WordPress.com | 10769 | 2.2% |
| Other values (3569) | 104988 |
Length
| Value | Count | Frequency (%) |
| 310382 | ||
| web | 170668 | |
| for | 140332 | |
| client | 105532 | 9.0% |
| app | 67365 | 5.8% |
| android | 66141 | 5.7% |
| iphone | 60755 | 5.2% |
| ifttt | 24950 | 2.1% |
| tweetdeck | 19819 | 1.7% |
| hootsuite | 18196 | 1.6% |
| Other values (4088) | 185625 |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 860813 | |
| e | 856723 | |
| 669503 | 9.2% | |
| i | 652249 | 9.0% |
| r | 615615 | 8.5% |
| T | 414215 | 5.7% |
| o | 413677 | 5.7% |
| w | 357485 | 4.9% |
| n | 276349 | 3.8% |
| b | 199230 | 2.7% |
| Other values (243) | 1939179 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5371299 | |
| Uppercase Letter | 1146737 | 15.8% |
| Space Separator | 669599 | 9.2% |
| Other Punctuation | 47277 | 0.7% |
| Decimal Number | 13835 | 0.2% |
| Other Letter | 1706 | < 0.1% |
| Dash Punctuation | 1465 | < 0.1% |
| Connector Punctuation | 1444 | < 0.1% |
| Close Punctuation | 569 | < 0.1% |
| Open Punctuation | 569 | < 0.1% |
| Other values (5) | 538 | < 0.1% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| ス | 185 | 10.8% |
| ト | 110 | 6.4% |
| マ | 93 | 5.5% |
| ニ | 93 | 5.5% |
| ュ | 93 | 5.5% |
| 動 | 81 | 4.7% |
| 自 | 81 | 4.7% |
| 用 | 77 | 4.5% |
| 稿 | 77 | 4.5% |
| 投 | 77 | 4.5% |
| Other values (121) | 739 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 860813 | |
| e | 856723 | |
| i | 652249 | |
| r | 615615 | |
| o | 413677 | |
| w | 357485 | |
| n | 276349 | 5.1% |
| b | 199230 | 3.7% |
| d | 196616 | 3.7% |
| f | 179979 | 3.4% |
| Other values (42) | 762563 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 414215 | |
| W | 185095 | |
| A | 136894 | 11.9% |
| C | 109663 | 9.6% |
| P | 90600 | 7.9% |
| F | 43546 | 3.8% |
| I | 41707 | 3.6% |
| D | 22178 | 1.9% |
| S | 20445 | 1.8% |
| B | 19922 | 1.7% |
| Other values (23) | 62472 | 5.4% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 44679 | |
| : | 1375 | 2.9% |
| ! | 549 | 1.2% |
| , | 363 | 0.8% |
| / | 131 | 0.3% |
| @ | 84 | 0.2% |
| ' | 58 | 0.1% |
| & | 31 | 0.1% |
| # | 3 | < 0.1% |
| ・ | 2 | < 0.1% |
| Other values (2) | 2 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 2148 | |
| 1 | 1801 | |
| 0 | 1747 | |
| 3 | 1568 | |
| 4 | 1439 | |
| 5 | 1280 | |
| 6 | 1093 | |
| 7 | 972 | |
| 8 | 922 | |
| 9 | 865 |
Space Separator
| Value | Count | Frequency (%) |
| 669503 | ||
| 95 | < 0.1% | |
| 1 | < 0.1% |
Other Symbol
| Value | Count | Frequency (%) |
| ® | 209 | |
| © | 5 | 2.3% |
| 🤖 | 1 | 0.5% |
Math Symbol
| Value | Count | Frequency (%) |
| | | 94 | |
| + | 2 | 2.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1465 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 1444 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 569 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 569 |
Modifier Letter
| Value | Count | Frequency (%) |
| ー | 212 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 13 |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ิ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6517727 | |
| Common | 735294 | 10.1% |
| Katakana | 774 | < 0.1% |
| Han | 690 | < 0.1% |
| Greek | 297 | < 0.1% |
| Hiragana | 142 | < 0.1% |
| Arabic | 52 | < 0.1% |
| Hangul | 28 | < 0.1% |
| Thai | 22 | < 0.1% |
| Cyrillic | 12 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 860813 | |
| e | 856723 | |
| i | 652249 | 10.0% |
| r | 615615 | 9.4% |
| T | 414215 | 6.4% |
| o | 413677 | 6.3% |
| w | 357485 | 5.5% |
| n | 276349 | 4.2% |
| b | 199230 | 3.1% |
| d | 196616 | 3.0% |
| Other values (54) | 1674755 |
Han
| Value | Count | Frequency (%) |
| 動 | 81 | 11.7% |
| 自 | 81 | 11.7% |
| 用 | 77 | 11.2% |
| 稿 | 77 | 11.2% |
| 投 | 77 | 11.2% |
| 销 | 14 | 2.0% |
| 台 | 14 | 2.0% |
| 平 | 14 | 2.0% |
| 营 | 14 | 2.0% |
| 体 | 11 | 1.6% |
| Other values (49) | 230 |
Common
| Value | Count | Frequency (%) |
| 669503 | ||
| . | 44679 | 6.1% |
| 2 | 2148 | 0.3% |
| 1 | 1801 | 0.2% |
| 0 | 1747 | 0.2% |
| 3 | 1568 | 0.2% |
| - | 1465 | 0.2% |
| _ | 1444 | 0.2% |
| 4 | 1439 | 0.2% |
| : | 1375 | 0.2% |
| Other values (26) | 8125 | 1.1% |
Katakana
| Value | Count | Frequency (%) |
| ス | 185 | |
| ト | 110 | |
| マ | 93 | |
| ニ | 93 | |
| ュ | 93 | |
| イ | 40 | 5.2% |
| ツ | 23 | 3.0% |
| タ | 21 | 2.7% |
| ィ | 16 | 2.1% |
| デ | 16 | 2.1% |
| Other values (10) | 84 |
Arabic
| Value | Count | Frequency (%) |
| ا | 10 | |
| ر | 7 | |
| ن | 6 | |
| ی | 6 | |
| ب | 3 | 5.8% |
| خ | 3 | 5.8% |
| ژ | 3 | 5.8% |
| آ | 3 | 5.8% |
| س | 3 | 5.8% |
| ت | 2 | 3.8% |
| Other values (6) | 6 |
Hangul
| Value | Count | Frequency (%) |
| 어 | 6 | |
| 판 | 3 | |
| 스 | 3 | |
| 뉴 | 2 | 7.1% |
| 와 | 2 | 7.1% |
| 이 | 2 | 7.1% |
| 디 | 1 | 3.6% |
| 기 | 1 | 3.6% |
| 술 | 1 | 3.6% |
| 미 | 1 | 3.6% |
| Other values (6) | 6 |
Greek
| Value | Count | Frequency (%) |
| Ο | 282 | |
| ρ | 2 | 0.7% |
| ί | 2 | 0.7% |
| α | 2 | 0.7% |
| Τ | 1 | 0.3% |
| π | 1 | 0.3% |
| ο | 1 | 0.3% |
| λ | 1 | 0.3% |
| η | 1 | 0.3% |
| Α | 1 | 0.3% |
| Other values (3) | 3 | 1.0% |
Hiragana
| Value | Count | Frequency (%) |
| い | 23 | |
| は | 17 | |
| る | 15 | |
| ぷ | 15 | |
| っ | 15 | |
| つ | 15 | |
| な | 13 | |
| て | 13 | |
| お | 8 | 5.6% |
| に | 4 | 2.8% |
Thai
| Value | Count | Frequency (%) |
| ฐ | 4 | |
| เ | 2 | |
| จ | 2 | |
| ิ | 2 | |
| ก | 2 | |
| ษ | 2 | |
| ร | 2 | |
| น | 2 | |
| า | 2 | |
| ศ | 2 |
Cyrillic
| Value | Count | Frequency (%) |
| О | 4 | |
| о | 2 | |
| в | 1 | 8.3% |
| т | 1 | 8.3% |
| с | 1 | 8.3% |
| Н | 1 | 8.3% |
| и | 1 | 8.3% |
| С | 1 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7252280 | |
| Katakana | 988 | < 0.1% |
| None | 825 | < 0.1% |
| CJK | 690 | < 0.1% |
| Hiragana | 142 | < 0.1% |
| Arabic | 52 | < 0.1% |
| Hangul | 27 | < 0.1% |
| Thai | 22 | < 0.1% |
| Cyrillic | 12 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 860813 | |
| e | 856723 | |
| 669503 | 9.2% | |
| i | 652249 | 9.0% |
| r | 615615 | 8.5% |
| T | 414215 | 5.7% |
| o | 413677 | 5.7% |
| w | 357485 | 4.9% |
| n | 276349 | 3.8% |
| b | 199230 | 2.7% |
| Other values (70) | 1936421 |
None
| Value | Count | Frequency (%) |
| Ο | 282 | |
| ® | 209 | |
| ó | 126 | |
| 95 | 11.5% | |
| ŋ | 50 | 6.1% |
| ´ | 13 | 1.6% |
| í | 9 | 1.1% |
| ĭ | 6 | 0.7% |
| © | 5 | 0.6% |
| é | 3 | 0.4% |
| Other values (22) | 27 | 3.3% |
Katakana
| Value | Count | Frequency (%) |
| ー | 212 | |
| ス | 185 | |
| ト | 110 | |
| マ | 93 | |
| ニ | 93 | |
| ュ | 93 | |
| イ | 40 | 4.0% |
| ツ | 23 | 2.3% |
| タ | 21 | 2.1% |
| ィ | 16 | 1.6% |
| Other values (12) | 102 |
CJK
| Value | Count | Frequency (%) |
| 動 | 81 | 11.7% |
| 自 | 81 | 11.7% |
| 用 | 77 | 11.2% |
| 稿 | 77 | 11.2% |
| 投 | 77 | 11.2% |
| 销 | 14 | 2.0% |
| 台 | 14 | 2.0% |
| 平 | 14 | 2.0% |
| 营 | 14 | 2.0% |
| 体 | 11 | 1.6% |
| Other values (49) | 230 |
Hiragana
| Value | Count | Frequency (%) |
| い | 23 | |
| は | 17 | |
| る | 15 | |
| ぷ | 15 | |
| っ | 15 | |
| つ | 15 | |
| な | 13 | |
| て | 13 | |
| お | 8 | 5.6% |
| に | 4 | 2.8% |
Arabic
| Value | Count | Frequency (%) |
| ا | 10 | |
| ر | 7 | |
| ن | 6 | |
| ی | 6 | |
| ب | 3 | 5.8% |
| خ | 3 | 5.8% |
| ژ | 3 | 5.8% |
| آ | 3 | 5.8% |
| س | 3 | 5.8% |
| ت | 2 | 3.8% |
| Other values (6) | 6 |
Hangul
| Value | Count | Frequency (%) |
| 어 | 6 | |
| 판 | 3 | |
| 스 | 3 | |
| 뉴 | 2 | 7.4% |
| 와 | 2 | 7.4% |
| 이 | 2 | 7.4% |
| 디 | 1 | 3.7% |
| 기 | 1 | 3.7% |
| 술 | 1 | 3.7% |
| 미 | 1 | 3.7% |
| Other values (5) | 5 |
Thai
| Value | Count | Frequency (%) |
| ฐ | 4 | |
| เ | 2 | |
| จ | 2 | |
| ิ | 2 | |
| ก | 2 | |
| ษ | 2 | |
| ร | 2 | |
| น | 2 | |
| า | 2 | |
| ศ | 2 |
Cyrillic
| Value | Count | Frequency (%) |
| О | 4 | |
| о | 2 | |
| в | 1 | 8.3% |
| т | 1 | 8.3% |
| с | 1 | 8.3% |
| Н | 1 | 8.3% |
| и | 1 | 8.3% |
| С | 1 | 8.3% |
| user_id | timestamp | tweet_id | sentiment_polarity | text_lang_ft | text_normalized | links | hashtag | hashtag_lang | hashtag_en | cashtag | media | image_url | video_url | GIF_url | likes | retweets | replies | reply_to_user | mentioned_users | quoted_tweet | quoted_by_count | credibility | tweet_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 116680494 | 2013-09-03 02:22:09+00:00 | 374718928682885121 | 0.2732 | en 88 | ['nation', 'agree', 'build', 'new', 'silk', 'road', 'china', 'enhance', 'partnership', 'neighbor', 'west', 'aim'] | http://bit.ly/17lyTPM | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 1.0 | twitterfeed |
| 1 | 172576367 | 2013-09-03 02:22:11+00:00 | 374718937889402880 | 0.2732 | en 84 | ['nation', 'agree', 'build', 'new', 'silk', 'road', 'china', 'enhance', 'partnership', 'neighbor', 'west', 'aim'] | http://bit.ly/17lySv6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 1.0 | twitterfeed |
| 2 | 154226261 | 2013-09-03 10:11:50+00:00 | 374837127873175553 | 0.0000 | en 47 | ['high', 'speed', 'rail', 'china', 'new', 'silk', 'road', 'perspective'] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 0 | 0 | NaN | 83521919 | NaN | 0 | NaN | Twitter for Websites |
| 3 | 61733677 | 2013-09-03 11:33:26+00:00 | 374857665735704576 | 0.2732 | en 65 | ['nation', 'agree', 'build', 'new', 'silk', 'road'] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | NaN | Twitter Web Client |
| 4 | 87775422 | 2013-09-03 20:10:51+00:00 | 374987876737765376 | 0.0000 | en 56 | ['china', 'kazakhstan', 'tajikistan', 'russia', 'mongolia', 'build', 'new', 'silk', 'road'] | http://usa.chinadaily.com.cn/epaper/2013-09/03/content_16940556.htm | China | en 50 | China | NaN | NaN | NaN | NaN | NaN | 2 | 6 | 0 | NaN | NaN | NaN | 0 | 0.0 | Hootsuite |
| 5 | 241024381 | 2013-09-03 20:30:54+00:00 | 374992924548685824 | 0.5859 | en 49 | ['xijinpe', 'tour', 'central', 'asia', 'aim', 'boost', 'energy', 'cooperation', 'pare', 'non', 'solo', 'newsilkroad'] | NaN | Asia energy NewSilkRoad | en 48 | Asia energy NewSilkRoad | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | NaN | Twitter Web Client |
| 6 | 848121642 | 2013-09-03 22:26:32+00:00 | 375022023241908224 | 0.4939 | en 40 | ['nation', 'agree', 'build', 'new', 'silk', 'roadbusiness', 'china', 'asia', 'energy'] | NaN | china asia energy | en 41 | china asia energy | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | 1050000000000000000 | NaN | 0 | NaN | toptoptopics |
| 7 | 195642840 | 2013-09-03 23:30:04+00:00 | 375038010070298627 | 0.0000 | en 60 | ['china', 'kazakhstan', 'tajikistan', 'russia', 'mongolia', 'build', 'new', 'silk', 'road'] | http://usa.chinadaily.com.cn/epaper/2013-09/03/content_16940556.htm | China | en 50 | China | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 0.0 | Twitter for Android |
| 8 | 29617875 | 2013-09-04 00:04:47+00:00 | 375046748600684544 | 0.3818 | en 77 | ['mongolia', 'china', 'russia', 'nation', 'build', 'new', 'silk', 'road', 'accelerate', 'economic', 'recovery', 'promote', 'trade'] | NaN | Mongolia | en 18 | Mongolia | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | NaN | Twitter Web Client |
| 9 | 256478435 | 2013-09-04 00:18:01+00:00 | 375050078136070144 | 0.2732 | en 29 | ['nation', 'agree', 'build', 'new', 'silk', 'roadbusiness'] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | NaN | Hootsuite |
| user_id | timestamp | tweet_id | sentiment_polarity | text_lang_ft | text_normalized | links | hashtag | hashtag_lang | hashtag_en | cashtag | media | image_url | video_url | GIF_url | likes | retweets | replies | reply_to_user | mentioned_users | quoted_tweet | quoted_by_count | credibility | tweet_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 500700 | 86197418 | 2019-10-09 11:24:36+00:00 | 1181893218935459843 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 1 | 1068680000000000000.0 | 8628872 | NaN | 0 | NaN | Twitter Web App |
| 500701 | 3097165470 | 2020-03-30 00:58:59+00:00 | 1244428878895960065 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [Photo(previewUrl='https://pbs.twimg.com/media/EUUZUyvU8AAHVBX?format=png&name=small', fullUrl='https://pbs.twimg.com/media/EUUZUyvU8AAHVBX?format=png&name=large'), Photo(previewUrl='https://pbs.twimg.com/media/EUUZXIiU0AAhYNB?format=png&name=small', fullUrl='https://pbs.twimg.com/media/EUUZXIiU0AAhYNB?format=png&name=large'), Photo(previewUrl='https://pbs.twimg.com/media/EUUZap9UUAEvxUf?format=png&name=small', fullUrl='https://pbs.twimg.com/media/EUUZap9UUAEvxUf?format=png&name=large'), Photo(previewUrl='https://pbs.twimg.com/media/EUUZdJ8UcAArYWU?format=png&name=small', fullUrl='https://pbs.twimg.com/media/EUUZdJ8UcAArYWU?format=png&name=large')] | NaN | NaN | NaN | 1 | 0 | 0 | NaN | NaN | NaN | 0 | NaN | Twitter Web App |
| 500702 | 886946654 | 2020-07-11 00:20:55+00:00 | 1281745248562151426 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | NaN | Twitter for iPad |
| 500703 | 954000000000000000 | 2020-07-24 12:14:34+00:00 | 1286635884981493761 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 1 | 44123487.0 | 44123487 154016912 | NaN | 0 | NaN | Twitter Web App |
| 500704 | 1030000000000000000 | 2020-10-30 13:55:46+00:00 | 1322175364752240642 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 954000000000000000.0 | 174872895 1180000000000000000 17629860 174872895 | NaN | 0 | NaN | Twitter Web App |
| 500705 | 2260128516 | 2020-12-14 09:14:43+00:00 | 1338412091988992001 | NaN | NaN | NaN | NaN | ['OOTT', 'OilPrices', 'OPEC', 'energy', 'COVID19', 'Bitcoin', 'GIEnergyOutlook21'] | NaN | NaN | NaN | [Video(thumbnailUrl='https://pbs.twimg.com/ext_tw_video_thumb/1338398795126763523/pu/img/MWptwOyFp2EI5RWv.jpg', variants=[VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1338398795126763523/pu/vid/658x360/BPyQi0cWMVjEygMv.mp4?tag=10', bitrate=832000), VideoVariant(contentType='application/x-mpegURL', url='https://video.twimg.com/ext_tw_video/1338398795126763523/pu/pl/TuUIlS6iBjnEDO1n.m3u8?tag=10', bitrate=None), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1338398795126763523/pu/vid/1318x720/RiDgDT9TmUCsYyYm.mp4?tag=10', bitrate=2176000), VideoVariant(contentType='video/mp4', url='https://video.twimg.com/ext_tw_video/1338398795126763523/pu/vid/494x270/N4JrGtnxK-YUYPj-.mp4?tag=10', bitrate=256000)], duration=28.228, views=77)] | NaN | NaN | NaN | 8 | 5 | 1 | 1029750000000000000.0 | NaN | NaN | 0 | NaN | Twitter Web App |
| 500706 | 1170000000000000000 | 2021-01-30 16:35:10+00:00 | 1355555164686413826 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6 | 1 | 3 | NaN | 4375297954 2487093234 710000000000000000 | NaN | 0 | NaN | Twitter Web App |
| 500707 | 182910373 | 2021-05-04 00:16:16+00:00 | 1389373273935433731 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 34 | 11 | 6 | NaN | 1190000000000000000 3097165470 10228272 | 1.389222e+18 | 0 | NaN | Twitter for Android |
| 500708 | 517645353 | 2021-05-17 11:00:35+00:00 | 1394246465724289024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 1 | 0 | NaN | NaN | NaN | 0 | NaN | Twitter for iPhone |
| 500709 | 48534642 | 2021-10-04 17:49:32+00:00 | 1445083681991843857 | NaN | NaN | NaN | NaN | ['Turkey'] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 34 | 8 | 1 | NaN | NaN | NaN | 0 | NaN | Twitter Web App |